On Sat, May 27, 2023 at 12:16 AM Tom Lane <t...@sss.pgh.pa.us> wrote:
> Ah. I realized that we could make the problem testable by adding > assertions that a joinclause we're not removing doesn't contain > any surviving references to the target rel or join. That turns > out to fire (without the bug fix) in half a dozen existing test > cases, so I felt that we didn't need to add another one. > > I did the other refactoring we discussed and pushed it. > Thanks for the report and review! Thanks for pushing it! I've managed to find some problems on master with the help of the new assertions. First the query below would trigger the new assertions. create table t (a int unique, b int); explain (costs off) select 1 from t t1 left join (select t2.a, 1 as c from t t2 left join t t3 on t2.a = t3.a) s on true left join t t4 on true where s.a < s.c; server closed the connection unexpectedly Note that 's.c' would be wrapped in PHV so the qual 's.a < s.c' is actually 't2.a < PHV(1)', and it is one of t3's joinquals. When we're removing the t2/t3 join, we decide that this PHV is no longer needed so we remove it entirely rather than just remove references from it. But actually its phrels still have references to t3 and t2/t3 join. So when we put back the qual 's.a < s.c', we will trigger the new assertions. At first glance I thought we can just remove the new assertions. But then I figured out that the problem is more complex than that. If the PHV contains lateral references, removing the PHV entirely would cause us to lose the information about the lateral references, and that may cause wrong query results in some cases. Consider the query below (after removing the two new assertions, or in a non-assert build). explain (costs off) select 1 from t t1 left join lateral (select t2.a, coalesce(t1.a, 1) as c from t t2 left join t t3 on t2.a = t3.a) s on true left join t t4 on true where s.a < s.c; QUERY PLAN -------------------------------------------------- Nested Loop Left Join -> Nested Loop -> Seq Scan on t t1 -> Materialize -> Seq Scan on t t2 Filter: (a < COALESCE(a, 1)) -> Materialize -> Seq Scan on t t4 (8 rows) The PHV of 'coalesce(t1.a, 1)' has lateral reference to t1 but we'd lose this information because we've removed this PHV entirely in remove_rel_from_query. As a consequence, we'd fail to extract the lateral dependency for t2 and fail to build the nestloop parameters for the t1/t2 join. And that causes wrong query results. We can see that if we insert some data into the table. insert into t select 1,1; insert into t select 2,2; On v15 the query above gives ?column? ---------- 1 1 (2 rows) but on master it gives ?column? ---------- (0 rows) I haven't thought through how to fix it, but I suspect that we may need to do more checking before we decide to remove PHVs in remove_rel_from_query. Thanks Richard