On Wed, Dec 4, 2024 at 11:38 PM Robert Haas <robertmh...@gmail.com> wrote: > On Sun, Nov 10, 2024 at 7:52 PM Richard Guo <guofengli...@gmail.com> wrote: > > Hmm, currently we only consider grouped aggregation for eager > > aggregation. For grouped aggregation, the window function's > > arguments, as well as the PARTITION BY expressions, must appear in the > > GROUP BY clause. That is to say, the depname column in the first > > query, or the n column in the second query, will not be aggregated > > into the partial groups. Instead, they will remain as they are as > > input for the WindowAgg nodes. It seems to me that this ensures > > that we're good with window functions. But maybe I'm wrong. > > Returning to this point now that I understand what you meant by > grouped aggregation: > > I still don't understand how you expect to be able to evaluate > functions like LEAD() and LAG() if any form of partial aggregation has > been done.
In grouped aggregation, the non-aggregate arguments of the window function must appear in the GROUP BY clause, so they will not be aggregated into the partial groups. It seems to me that this ensures that they remain available as valid inputs for the window function. For the Aggref arguments of the window function, their final values are calculated in the Finalize Agg node, meaning they, too, are good to be used as inputs for the window function. As an example, please consider create table tbl (a int, b int, c int); insert into tbl select i%3, i%3, i%3 from generate_series(1,1000)i; analyze tbl; explain (verbose, costs off) select lead(t1.a+sum(t2.b)) over (), sum(t2.c) from tbl t1 join tbl t2 on t1.b = t2.b group by t1.a; QUERY PLAN ------------------------------------------------------------------------------ WindowAgg Output: lead((t1.a + (sum(t2.b)))) OVER (?), (sum(t2.c)), t1.a -> Finalize HashAggregate Output: t1.a, sum(t2.b), sum(t2.c) Group Key: t1.a -> Hash Join Output: t1.a, (PARTIAL sum(t2.b)), (PARTIAL sum(t2.c)) Hash Cond: (t1.b = t2.b) -> Seq Scan on public.tbl t1 Output: t1.a, t1.b, t1.c -> Hash Output: t2.b, (PARTIAL sum(t2.b)), (PARTIAL sum(t2.c)) -> Partial HashAggregate Output: t2.b, PARTIAL sum(t2.b), PARTIAL sum(t2.c) Group Key: t2.b -> Seq Scan on public.tbl t2 Output: t2.a, t2.b, t2.c (17 rows) It seems to me that both 't1.a' and 'sum(t2.b)' are valid inputs for LEAD(), even though we have performed partial aggregation. Am I missing something? Thanks Richard