On Fri, Nov 1, 2024 at 2:21 AM Richard Guo <guofengli...@gmail.com> wrote: > ... an aggregated row from the partial > aggregation matches the other side of the join if and only if each row > in the partial group does, thereby ensuring that all rows in the same > partial group have the same 'destiny'.
Ah, I really like this turn of phrase! I think it's clearer and simpler than what I said. And I think it implies that we don't need to explicitly deduce surrogate grouping keys. For example if we have A JOIN B JOIN C JOIN D JOIN E JOIN F, grouped by columns from A, we don't need to work out surrogate grouping keys for B and then C and then D and then E and then F. We can just look at F's join clauses and that tells us how to group, independent of anything else. But is there any hole in that approach? I think the question is whether the current row could be used in some way that doesn't show up in the join clauses. I can't think of any way for that to happen, really. I believe that any outerjoin-delayed quals will show up as join clauses, and stuff like ORDER BY and HAVING will happen after the aggregation (at least logically) so it should be fine. Windowed functions and ordered aggregates may be a blocker to the optimization, though: if the window function needs access to the unaggregated rows, or even just needs to know how many rows there are, then we'd better not aggregate them before the window function runs; and if the aggregate is ordered, we can only partially aggregate the data if it is already ordered in a way that is compatible with the final, desired ordering. Another case we might need to watch out for is RLS. RLS wants to apply all the security quals before any non-leakproof functions, and pushing down the aggregation might push an aggregate function past security quals. Perhaps there are other cases to worry about as well; this is all I can think of at the moment. But regardless of those kinds of cases, the basic idea that we want the partially aggregate rows to join if and only if the unaggregated rows would have joined seems exactly correct to me, and that provides theoretical justification for deriving the surrogate grouping key directly from the join quals. Woot! -- Robert Haas EDB: http://www.enterprisedb.com