For a subquery relation, currently we consider pushing down its restriction clauses to become WHERE or HAVING quals of the subquery itself. This transformation can potentially avoid the need to evaluate all subquery output rows before filtering them, which may lead to a more efficient execution plan. As an example, please consider:
explain (costs off) select * from t t1 join (select t2.a, t2.b, sum(t2.b) from t t2 group by t2.a, t2.b) s on s.a = 1; QUERY PLAN ------------------------------------- Nested Loop -> Seq Scan on t t1 -> Materialize -> HashAggregate Group Key: t2.b -> Seq Scan on t t2 Filter: (a = 1) (7 rows) If the subquery is LATERAL, its ppi_clauses -- that is, join clauses available from its required outer rels -- sort of behave like restriction clauses. As an example, please consider: explain (costs off) select * from t t1 join lateral (select t2.a, t2.b, sum(t2.b), t1.a x from t t2 group by t2.a, t2.b) s on s.a = t1.a; QUERY PLAN ------------------------------------- Nested Loop -> Seq Scan on t t1 -> Subquery Scan on s Filter: (t1.a = s.a) -> HashAggregate Group Key: t2.a, t2.b -> Seq Scan on t t2 (7 rows) Here, I'd like to discuss whether it's worthwhile to also consider pushing down a subquery relation's ppi_clauses if the subquery is LATERAL. First, it's important to note that pushing down ppi_clauses doesn't always result in a better execution plan. While doing so can reduce the amount of data processed in each aggregation invocation within the subquery, it also means that the aggregation needs to be re-evaluated for every outer tuple. If t1 is very small and t2 is large, pushing down ppi_clauses can be a win. As t1 gets larger, this gets less attractive, and eventually it will have a higher cost than the current plan, where the aggregation is evaluated only once. Therefore, if we decide to pursue this approach, we would need to generate two paths: one with the ppi_clauses pushed down, and one without, and then compare their costs. A potential concern is that this might require re-planning the subquery twice, which could increase planning overhead. On the other hand, once this optimization is implemented, in addition to making lateral subqueries more efficient in some cases, it will enable us to pull up more complicated sublinks. To be more concrete, consider query: select * from foo where foo.a > (select avg(bar.a) from bar where foo.b = bar.b); ... it can be transformed into something like: select * from foo join (select bar.b, avg(bar.a) as avg from bar group by bar.b) sub on foo.b = sub.b and foo.a > sub.avg; Currently, the planner does not perform this transformation because it's not always a win -- if foo is very small and bar is large, the original form tends to be more efficient. However, if we can push down the subquery's join clauses into the subquery, the resulting parameterized paths will be pretty nearly performance-equivalent to the origin form. We can then compare these parameterized paths with the unparameterized paths that calculate the aggregation only once. With this capability, it becomes safe to always pull up such sublinks. In some articles, the PostgreSQL optimizer is noted to lack certain important transformations. For example, one recent article from Microsoft states: " Comparison with Volcano/Cascades: In PostgreSQL the separation of scan/join planning from query flattening and post scan/join phase prevents the exploitation of important transformations such as the reordering of GROUP BY and joins, and decorrelation of correlated subqueries (see Sections 4.4 and 4.5). This results in a reduced search space of plans and therefore can result in missing out plans with lower cost. " I hope the transformation proposed here can help address part of the gap related to decorrelation of correlated subqueries. (BTW, for the "reordering of GROUP BY and joins" transformation, I have another patch called "Eager Aggregation" that specifically targets that optimization.) There are still many details I haven't fully thought through. Before investing too much time in this idea, I'd appreciate some feedback to make sure I'm not overlooking something obvious. Thanks Richard