On Thu, Jun 2, 2016 at 5:44 PM, Kevin Grittner <kgri...@gmail.com> wrote:
> On Thu, Jun 2, 2016 at 3:23 PM, David G. Johnston > <david.g.johns...@gmail.com> wrote: > > On Thu, Jun 2, 2016 at 4:11 PM, John R Pierce <pie...@hogranch.com> > wrote: > >> Thanks all the below seem to do the trick. > > I doubt it -- using NOT IN requires (per the SQL specification) > handling NULLs in a way that probably does not give you the answer > you want. (NOT IN also is often much slower than the NOT EXISTS > test which will actually give you the answer you want.) > > test=# create table t (id int not null primary key, ref_id int, sts > int not null default 0); > CREATE TABLE > test=# insert into t values > (1,null,0),(2,1,1),(3,null,0),(4,null,0),(5,4,1),(6,null,0),(7,6,1); > INSERT 0 7 > test=# select max(id) from t where sts=0 and id not in (select ref_id from > t); > max > ----- > > (1 row) > > test=# select max(id) from t t1 where sts = 0 and not exists (select * > from t t2 where t2.ref_id = t1.id); > max > ----- > 3 > (1 row) > > Note that providing minimal setup (like the above) helps in getting > good answers quickly. > > >> do note, this is whats known as an 'anti-join', and these can be pretty > >> expensive on large tables. > > > > +1 > > *Can* be. Proper indexing can make them very reasonable. > Doh (me)... Indeed, NOT IN (...) doesn't qualify as an anti-join since (for one) it cannot (I don't think) be optimized in this way as the entire contents of the IN() need to be determined. IOW, its not really a join but just another predicate condition whose one side is a subquery. That said, writing out a full anti-join NOT EXISTS (or, similarly, a semi-join EXISTS) clause can be a bit tedious for ad-hoc stuff while the IN() variation is a bit more succinct and, I'd venture to say, unfortunately familiar. In can be made to work in this situation by writing the expression as IN (SELECT ref_id FROM t WHERE ref_id IS NOT NULL). David J.