On Thu, Jun 2, 2016 at 5:44 PM, Kevin Grittner <kgri...@gmail.com> wrote:

> On Thu, Jun 2, 2016 at 3:23 PM, David G. Johnston
> <david.g.johns...@gmail.com> wrote:
> > On Thu, Jun 2, 2016 at 4:11 PM, John R Pierce <pie...@hogranch.com>
> wrote:
> >> Thanks all the below seem to do the trick.
>
> I doubt it -- using NOT IN requires (per the SQL specification)
> handling NULLs in a way that probably does not give you the answer
> you want.  (NOT IN also is often much slower than the NOT EXISTS
> test which will actually give you the answer you want.)
>
> test=# create table t (id int not null primary key, ref_id int, sts
> int not null default 0);
> CREATE TABLE
> test=# insert into t values
> (1,null,0),(2,1,1),(3,null,0),(4,null,0),(5,4,1),(6,null,0),(7,6,1);
> INSERT 0 7
> test=# select max(id) from t where sts=0 and id not in (select ref_id from
> t);
>  max
> -----
>
> (1 row)
>
> test=# select max(id) from t t1 where sts = 0 and not exists (select *
> from t t2 where t2.ref_id = t1.id);
>  max
> -----
>    3
> (1 row)
>
> Note that providing minimal setup (like the above) helps in getting
> good answers quickly.
>
> >> do note, this is whats known as an 'anti-join', and these can be pretty
> >> expensive on large tables.
> >
> > +1
>
> *Can* be.  Proper indexing can make them very reasonable.
>

​Doh (me)...

Indeed, NOT IN (...) doesn't qualify as an anti-join since (for one) it
cannot (I don't think) be optimized in this way as the entire contents of
the IN() need to be determined.  IOW, its not really a join but just
another predicate condition whose one side is a subquery.

That said, writing out a full anti-join NOT EXISTS (or, similarly, a
semi-join EXISTS) clause can be a bit tedious for ad-hoc stuff while the
IN() variation is a bit more succinct and, I'd venture to say,
unfortunately familiar.  In can be made to work in this situation by
writing the expression as IN (SELECT ref_id FROM t WHERE ref_id IS NOT
NULL).

David J.

Reply via email to