On Wed, Jul 29, 2009 at 12:16 AM, Tom Molesworth<t...@audioboundary.com> wrote: > Ole Tange wrote: >> On Tue, Jul 28, 2009 at 3:47 PM, Tom Lane<t...@sss.pgh.pa.us> wrote: >>> "Ole Tange" <postgresql....@tange.dk> writes: >>> >>>> (modulo NULLs which seem to always cause problems in NOT INs). >>> >>> Let's see, you understand that the rewrite violates the SQL standard >>> semantics of NOT IN, but you think we should do it anyway? > > Just an observation - it seems that you're using NOT IN() and expecting it > to do the same as this: > > SELECT foo FROM a LEFT JOIN b ON a.key = b.key WHERE b.key IS NULL
For my purposes this query is fine, too. It will, however, still cause the code to be less readable than 'a.key NOT IN (...)'. As I found IN actually performs fine, I naively thought I could just move the NOT in my own code or use '= False': SELECT foo FROM a WHERE NOT (key IN (SELECT key FROM b)); -- This is slow SELECT foo FROM a WHERE (key IN (SELECT key FROM b)) = False; -- This is slow, too But EXPLAIN tells me these are just as expensive as NOT IN. So though this is fairly readable to me, they still suffer from the performance. > I find it's comparatively rare to actually want the NOT IN() null-handling > semantics, especially given your comment about nulls 'causing problems' I agree with that. But if the standard says 'Do something that will surprise the average user' then I understand that you will have to make a choice between following the standard or pleasing the user. Postgresql chose to follow the standard, which I believe is a valid decision; it did, however, cause me a few hours of debugging. I would have loved if 'NOT IN (...NULL...)' had given me some output that would have alerted me to the NULL issue. But I do not see a simple way of doing that. > (although I guess you could get the same behaviour in that query as NOT IN, > by adding 'AND a.key IS NOT NULL' to the where clause - I have no idea > whether this is something the optimiser could or should be able to do). The problem is not NULLs in table a, but NULLs in table b. If b contains a NULL the query should return 0 records. > but I find the left join approach works well enough > up to several million rows in the two tables. An unindexed subquery like > 'select key from a' just seems like a guaranteed way to invoking a full > (slow) table scan. EXPLAIN tells me the LEFT JOIN does table scan as well. It seems the primary reason why your LEFT JOIN is faster than my 3 line alternative is because I do DELETEs, which is slow. So thank you for the LEFT JOIN idea. /Ole -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs