Hi, Ken,
Ken Hill schrieb:
> I need some help with a bit of SQL. I have two tables. I want to find
> records in one table that don't match records in another table based on
> a common column in the two tables. Both tables have a column named
> 'key100'. I was trying something like:
>
> SELECT count(*)
> FROM table1, table2
> WHERE (table1.key100 != table2.key100);
>
> But the query is very slow and I finally just cancel it. Any help is
> very much appreciated.
Do you have indices on the key100 columns? Is autovacuum running, or do
you do analyze manually?
Can you send us the output from "EXPLAIN ANALYZE [your query]"?
Btw, I don't think this query will do what you wanted, it basically
creates a cross product, that means if your tables look like:
schabitest=# select * from table1;
key100 | valuea | valueb
--------+--------+--------
1 | foo | bar
2 | blah | blubb
3 | manga | mungo
schabitest=# select * from table2;
key100 | valuec | valued
--------+--------+--------
1 | monday | euro
2 | sunday | dollar
4 | friday | pounds
Then your query will produce something like:
schabitest=# select * from table1, table2 WHERE (table1.key100 !=
table2.key100);
key100 | valuea | valueb | key100 | valuec | valued
--------+--------+--------+--------+--------+--------
1 | foo | bar | 2 | sunday | dollar
1 | foo | bar | 4 | friday | pounds
2 | blah | blubb | 1 | monday | euro
2 | blah | blubb | 4 | friday | pounds
3 | manga | mungo | 1 | monday | euro
3 | manga | mungo | 2 | sunday | dollar
3 | manga | mungo | 4 | friday | pounds
I suggest you would like to have all records from table1 that don't have
a corresponding record in table2:
schabitest=# select * from table1 where table1.key100 not in (select
key100 from table2);
key100 | valuea | valueb
--------+--------+--------
3 | manga | mungo
HTH,
Markus
Here is my query SQL:SELECT key100 FROM ncccr10
WHERE ncccr10.key100 NOT IN (SELECT key100 FROM ncccr9);
It is is running after 30 minutes. Here is the query plan:
QUERY PLAN
-------------------------------------------------------------------------
Seq Scan on ncccr10 (cost=0.00..20417160510.08 rows=305782 width=104)
Filter: (NOT (subplan))
SubPlan
-> Seq Scan on ncccr9 (cost=0.00..65533.71 rows=494471 width=104)
(4 rows)
Any ideas why it is so slow?
