On Mon, Sep 19, 2022 at 3:58 PM James Coleman <jtc...@gmail.com> wrote: > But in the case where there's correlation via LATERAL we already don't > guarantee unique executions for a given set of params into the lateral > subquery execution, right? For example, suppose we have: > > select * > from foo > left join lateral ( > select n > from bar > where bar.a = foo.a > limit 1 > ) on true > > and suppose that foo.a (in the table scan) returns these values in > order: 1, 2, 1. In that case we'll execute the lateral subquery 3 > separate times rather than attempting to order the results of foo.a > such that we can re-execute the subquery only when the param changes > to a new unique value (and we definitely don't cache the results to > guarantee unique subquery executions).
I think this is true, but I don't really understand why we should focus on LATERAL here. What we really need, and I feel like we've talked about this before, is a way to reason about where parameters are set and used. Your sample query gets a plan like this: Nested Loop Left Join (cost=0.00..1700245.00 rows=10000 width=8) -> Seq Scan on foo (cost=0.00..145.00 rows=10000 width=4) -> Limit (cost=0.00..170.00 rows=1 width=4) -> Seq Scan on bar (cost=0.00..170.00 rows=1 width=4) Filter: (foo.a = a) If this were to occur inside a larger plan tree someplace, it would be OK to insert a Gather node above the Nested Loop node without doing anything further, because then the parameter that stores foo.a would be both set and used in the worker. If you wanted to insert a Gather at any other place in this plan, things get more complicated. But just because you have LATERAL doesn't mean that you have this problem, because if you delete the "limit 1" then the subqueries get flattened together and the parameter disappears, and if you delete the lateral reference (i.e. WHERE foo.a = bar.a) then there's still a subquery but it no longer refers to an outer parameter. And on the flip side just because you don't have LATERAL doesn't mean that you don't have this problem. e.g. the query could instead be: select *, (select n from bar where bar.a = foo.a limit 1) from foo; ...which I think is pretty much equivalent to your formulation and has the same problem as far as parallel query as your formulation but does not involve the LATERAL keyword. -- Robert Haas EDB: http://www.enterprisedb.com