Antonin Houska <a...@cybertec.at> wrote: > One problem I see is that SubLink can be in the JOIN/ON clause and thus it's > not necessarily at the top of the join tree. Consider this example: > > CREATE TABLE a(i int); > CREATE TABLE b(j int); > CREATE TABLE c(k int NOT NULL); > CREATE TABLE d(l int); > > SELECT * > FROM > a > JOIN b ON b.j NOT IN > ( SELECT > c.k > FROM > c) > JOIN d ON b.j = d.l; > > Here the b.j=d.l condition makes the planner think that the "b.j NOT IN > (SELECT c.k FROM c)" sublink cannot receive NULL values of b.j, but that's not > true: it's possible that ((a JOIN b) ANTI JOIN c) is evaluated before "d" is > joined to the other tables, so the NULL values of b.j are not filtered out > early enough. > > I thought it would help if find_innerjoined_rels(), when called from > expressions_are_not_nullable(), only collected rels (and quals) from the > subtree below the sublink, but that does not seem to help: > > CREATE TABLE e(m int); > > SELECT * > FROM > a > JOIN e ON a.i = e.m > JOIN b ON a.i NOT IN > ( SELECT > c.k > FROM > c) > JOIN d ON COALESCE(a.i, 0) = COALESCE(d.l, 0); > > Here it might seem that the a.i=e.m condition eliminates NULL values from the > ANTI JOIN input, but it's probably hard to prove at query preparation time > that > > (((a JOIN e) JOIN b) ANTI JOIN c) JOIN d > > won't eventually be optimized to > > (((a JOIN d) JOIN b) ANTI JOIN c) JOIN e > > Since the join condition between "a" and "d" is not strict in this case, the > ANTI JOIN will receive the NULL values of a.i. > > It seems tricky, I've got no idea of an alternative approach right now.
Just one idea: perhaps we could use something like PlaceHolderVar to enforce evaluation of the inner join expression ("a.i=e.m" in the example above) at certain level of the join tree (in particular, below the ANTI JOIN) - something like make_outerjoininfo() does here: /* Else, prevent join from being formed before we eval the PHV */ min_righthand = bms_add_members(min_righthand, phinfo->ph_eval_at); Unlike the typical use of PHV, we would not have to check whether the expression is not evaluated too low in the tree because the quals collected by find_innerjoined_rels() should not reference nullable side of any outer join. -- Antonin Houska Web: https://www.cybertec-postgresql.com