Wednesday, July 7, 2010

Computing table content differences with Oracle SQL

Computing table content differences with Oracle SQL


The question "what is the difference between two tables? is simple to ask, but difficult to code. It's like an "anti-union" where we seek only the unique rows within each table.

Tropashko has a great analysis of this common SQL problem from both a mathematical perspective and a practical Oracle perspective. It also shows an undocumented technique for improving execution speed for querying the non-intersection of two tables.

Vadim notes that the _convert_set_to_join parameter can be used when comparing the contents of two tables. This type of SQL query is called a "semantic difference", or an "anti-union" operation. Tropashko depicts the table comparison problem below:

In the above figure we read the "/" as "not in" and the "∪" as "union". In sum, we want the table comparison SQL to find:

Rows in table A that are not in table B
plus
Rows in table B that are not in table A

Vadim notes that the obvious solution van have poor performance:

The figure 1 with the expressions (A \ B) ∪ (B \ A) and (A ∪ B) \ (A ∩ B) pretty much exhaust all the theory involved. It is straightforward to translate them into SQL queries. Here is the first one:

(select * from A
minus
select * from B) -- Rows in A that are not in B
union all
(
select * from B
minus
select * from A
) -- rows in B that are not in A

In practice, however, this query is a sluggish performer. With a naïve evaluation strategy, the execution flow and the operators are derived verbatim from the SQL which we have written. First, each table has to be scanned twice. Then, four sort operators are applied in order to exclude duplicates. Next, the two set differences are computed, and, finally, the two results are combined together with the union operator. . .

No comments:

Post a Comment

Popular Posts