"David G. Johnston" <david.g.johns...@gmail.com> writes: > On Sunday, August 25, 2024, Tobias Hoffmann <ldev-l...@thax.hardliners.org> > wrote: >> 3) Problematic example: >> >> # EXPLAIN SELECT * FROM tbl1 JOIN tbl2 ON tbl2.site_id = tbl1.site_id >> WHERE tbl1.site_id = 1 OR tbl1.site_id IS NULL;
> The “is null” predicate in this query is doing nothing as your next comment > alludes to; you will produce no rows out of the join with a null site_id > due to the use of the equals operator in the join. Indeed. This WHERE clause might be useful with a left join to tbl1, but then the IS NULL means something totally different and can *not* be pushed down. > Others may correct me but I’m guessing that indeed the optimizer has a gap > here that could be filled in, it’s just it feels like adding code to deal > with broken queries so isn’t overly motivated to work on. The short answer is that we expend quite a lot of effort to deduce implied equality clauses from combinations of equality clauses; that is, given "WHERE A = B AND B = C" we can deduce "A = C" as well. No such deductions can be made from equality clauses that are buried under an OR, because they might not be true for every join row. Maybe some machinery could be built that would do something useful with an OR clause of this form, but I doubt it would be useful often enough to justify the development effort and additional planner cycles. An important point here is that "WHERE A = B AND p(A)" does not permit us to deduce "p(B)" for arbitrary conditions p(), because we have some equality operators that will return true for values that sort equal but are distinguishable in other ways. (Handy example: in float8 arithmetic, zero and minus zero compare equal, as required by the IEEE float spec.) We presently don't assume that this works for anything other than transitive equality across equality operators of a single btree operator family. In particular, I don't think we could assume it for the given example, because "WHERE A = B AND A IS NULL" is tautologically false. You can deduce anything from a falsehood, so I'm not entirely sure that the proposed optimization is even logically sound. > Joining using > distinct instead of equality is uncommon, since nearly all models join > primary keys to foreign keys and both of those are almost always non-null. Yeah. I'll concede that we probably should work harder on building out planner and executor support for IS NOT DISTINCT FROM. But again, it just seems like a case that is not worth spending large amounts of development time on. regards, tom lane