I've just come across what seems to be a bug in our current planner code. The existing code understands about transitive propagation of equalities: given
SELECT * FROM a, b WHERE a.x = b.y AND a.x = 10 it will deduce b.y = 10 and then remove the a.x = b.y condition as redundant. In general this is only supposed to happen when the clauses involved are "valid everywhere" in the query: if any of the variables come from relations that are on the nullable side of an outer join, we don't make these deductions, because in fact the correct result may contain rows where the derived equalities wouldn't hold. But consider SELECT * FROM a LEFT JOIN b ON a.x = b.y WHERE a.x = 10 It would be incorrect to suppose that b.y is equal to a.x above the outer join. However, when there is a constant involved, the current code nonetheless deduces b.y = 10 and removes the join condition, resulting in something that in strict SQL would look like SELECT * FROM a LEFT JOIN (SELECT * FROM b WHERE b.y = 10) b ON true WHERE a.x = 10 I believe this transformation is OK in this case, because any b rows that don't have y = 10 won't contribute to the join result anyway, and so need not be fetched from the table. But what if there's a nested outer join? This logic would suggest that we shouldn't apply the b.y constraint until above the lower join. That is, consider SELECT * FROM a LEFT JOIN (b LEFT JOIN c ON b.yy = c.zz) ON a.x = c.z WHERE a.x = 10; We can derive c.z = 10, but where do we hang it on the tree? The current code pushes it as far down as possible (to the join of the relations actually mentioned), yielding SELECT * FROM a LEFT JOIN (b LEFT JOIN (SELECT * FROM c WHERE c.z = 10) c ON b.yy = c.zz) ON true WHERE a.x = 10; where it seems that this would be safer: SELECT * FROM a LEFT JOIN (SELECT * FROM (b LEFT JOIN c ON b.yy = c.zz) WHERE c.z = 10) bc ON true WHERE a.x = 10; I think that this may be OK anyway, though, on the assumption that any such derived clause is strict (which we already assume for mergejoinable operators anyway). In the example just above, the presence of a strict constraint on c above the lower outer join would justify reducing that join to an inner join, whereupon it's OK to let the constraint bubble down anyway. We don't currently do that reduction, because reduce_outer_joins runs long before any of this happens, but nonetheless the answer is correct. One problem arises when the derived clause isn't strict, which seems possible if you do sufficiently weird things. Suppose we instead write SELECT * FROM a LEFT JOIN (b LEFT JOIN c ON b.yy = c.zz) ON a.boolcol = (c.z IS NULL) WHERE a.boolcol = false; This will result in deriving "(c.z IS NULL) = false", which isn't wrong, and pushing it down below the lower outer join, which is. The query should reject null-extended b rows, but won't. I'm not sure whether there are any other cases where it could fail due to a different (more complicated) join structure, without needing a non-strict clause to trigger the problem. Fixing this seems a bit difficult/invasive because distribute_qual_to_rels depends on transient state that's not around anymore by the time we are deducing these derived clauses, and so we don't have the infrastructure to hang the derived clauses at the right place in the tree. I intend to fix it in HEAD as part of the EquivalenceClass rewrite that I'm working on now (which is plenty invasive already), but how hard should we try to fix it in the back branches? Can anyone think of non-contrived examples that make it fail? One option is just to rip out all the generate_outer_join_implications() code instead of trying to fix it. Comments? regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster