Hi Guys,

This isn't a PostgreSQL specific question but just a SQL question.  If this is 
not an appropriate question for this list, please let me know.

It is also, perhaps, a really silly question.

This query (without the 'explain' keyword) , when executed takes forever and a 
day:

> condor_development=> explain select id from filesets where id not in ( select 
> fileset_id from service_pack_fileset_maps );
>                                           QUERY PLAN                          
>                 
> ----------------------------------------------------------------------------------------------
>  Seq Scan on filesets  (cost=0.00..71937742.00 rows=26088 width=4)
>    Filter: (NOT (SubPlan 1))
>    SubPlan 1
>      ->  Materialize  (cost=0.00..2517.78 rows=95852 width=4)
>            ->  Seq Scan on service_pack_fileset_maps  (cost=0.00..1663.52 
> rows=95852 width=4)
> (5 rows)

This query returns within a second:

> condor_development=> explain select id from filesets where id not in ( select 
> distinct fileset_id from service_pack_fileset_maps );
>                                           QUERY PLAN                          
>                 
> ----------------------------------------------------------------------------------------------
>  Seq Scan on filesets  (cost=2102.31..3153.53 rows=26088 width=4)
>    Filter: (NOT (hashed SubPlan 1))
>    SubPlan 1
>      ->  HashAggregate  (cost=1903.15..2062.48 rows=15933 width=4)
>            ->  Seq Scan on service_pack_fileset_maps  (cost=0.00..1663.52 
> rows=95852 width=4)
> (5 rows)

The difference is the "distinct" keyword in the inner select.

What I'm confused about is why isn't the "distinct" implicit?  I thought the 
construct "blah in ( select ... )" was using "sets" and an item (I thought) can 
not be in a set more than once.

Perhaps my question is the opposite really?  Why would you not always use 
"distinct" in the inner select when the operator is "in" or "not in" ?

And if I can throw in another question on top: is there a different method 
other than "not in" that would work better?

Thank you guys for the help and a really awesome database.
pedz
(this is *suppose* to be sent as plain text... I hope my mailer does what it is 
told)


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Reply via email to