Re: [BUGS] BUG #4949: NOT IN is prohibitive slower than the rewrite for medium to large sets

Tom Molesworth Tue, 28 Jul 2009 16:00:46 -0700

Ole Tange wrote:

On Tue, Jul 28, 2009 at 3:47 PM, Tom Lane<t...@sss.pgh.pa.us> wrote:

"Ole Tange" <postgresql....@tange.dk> writes:

(modulo NULLs which seem to always cause problems in NOT INs).
Because it can be rewritten, NOT IN should never be much slower than the
rewritten solution, as PostgreSQL should simply rewrite NOT IN to the
above.

Let's see, you understand that the rewrite violates the SQL standard
semantics of NOT IN, but you think we should do it anyway?


Thanks for your kind reply.

Apparently my bug report was not quite clear. My rewrite example was
simply to show a way that could do a marvelous speedup on medium to
large sets. The correct dealing with NULL I am sure can be handled
just as efficiently.

Just an observation - it seems that you're using NOT IN() and expectingit to do the same as this:


SELECT foo FROM a LEFT JOIN b ON a.key = b.key WHERE b.key IS NULL

I find it's comparatively rare to actually want the NOT IN()null-handling semantics, especially given your comment about nulls'causing problems' (although I guess you could get the same behaviour inthat query as NOT IN, by adding 'AND a.key IS NOT NULL' to the whereclause - I have no idea whether this is something the optimiser could orshould be able to do).


Although it refers to TSQL, this page might help explain more about NOT IN:

http://stackoverflow.com/questions/129077/sql-not-in-constraint-and-null-values

Not sure if constructions such as 'not exists (select 1 from b whereb.key = a.key)' also have the same performance issues you describe -should be faster due to the key? - but I find the left join approachworks well enough up to several million rows in the two tables. Anunindexed subquery like 'select key from a' just seems like a guaranteedway to invoking a full (slow) table scan.


Tom

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #4949: NOT IN is prohibitive slower than the rewrite for medium to large sets

Reply via email to