>>>>> "Tom" == Tom Lane <t...@sss.pgh.pa.us> writes:
>> What that code does is produce plans that look like this: >> GroupAggregate >> -> Sort >> -> ChainAggregate >> -> Sort >> -> ChainAggregate >> in much the same way that WindowAgg nodes are generated. Tom> That seems pretty messy, especially given your further comments Tom> that these plan nodes are interconnected and know about each Tom> other (though you failed to say exactly how). I'd already explained in more detail way back when we posted the patch. But to reiterate: the ChainAggregate nodes pass through their input data unchanged, but on group boundaries they write aggregated result rows to a tuplestore shared by the whole chain. The top node returns the data from the tuplestore after its own output is completed. The chain_head pointer in the ChainAggregate nodes is used for: - obtaining the head node's targetlist and qual, to use to project rows into the tuplestore (the ChainAggregate nodes don't do ordinary projection so they have dummy targetlists like the Sort nodes do) - obtaining the pointer to the tuplestore itself - on rescan without parameter change, to inform the parent node whether or not the child nodes are also being rescanned (since the Sort nodes may or may not block this) Tom> The claimed analogy to WindowAgg therefore seems bogus since Tom> stacked WindowAggs are independent, AFAIR anyway. The analogy is only in that they need to see the same input rows but in different sort orders. Tom> I'm also wondering about performance: doesn't this imply more Tom> rows passing through some of the plan steps than really Tom> necessary? There's no way to cut down the number of rows seen by intermediate nodes unless you implement (and require) associative aggregates, which we do not do in this patch (that's left for possible future optimization efforts). Our approach makes no new demands on the implementation of aggregate functions. Tom> Also, how would this extend to preferring hashed aggregation in Tom> some of the grouping steps? My suggestion for extending it to hashed aggs is: by having a (single) HashAggregate node keep multiple hash tables, per grouping set, then any arbitrary collection of grouping sets can be handled in one node provided that memory permits and no non-hashable features are used. So the normal plan for CUBE(a,b) under this scheme would be just: HashAggregate Grouping Sets: (), (a), (b), (a,b) -> (input path in unsorted order) If a mixture of hashable and non-hashable data types are used, for example CUBE(hashable,unhashable), then a plan of this form could be constructed: HashAggregate Grouping Sets: (), (hashable) -> ChainAggregate Grouping Sets: (unhashable), (unhashable,hashable) -> (input path sorted by (unhashable,hashable)) Likewise, plans of this form could be considered for cases like CUBE(low_card, high_card) where hashed grouping on high_card would require excessive memory: HashAggregate Grouping Sets: (), (low_card) -> ChainAggregate Grouping Sets: (high_card), (high_card, low_card) -> (input path sorted by (high_card, low_card)) Tom> ISTM that maybe what we should do is take a cue from the SQL Tom> spec, which defines these things in terms of UNION ALL of Tom> plain-GROUP-BY operations reading from a common CTE. I looked at that, in fact that was our original plan, but it became clear quite quickly that it was not going to be easy. I tried two different approaches. First was to actually re-plan the input (i.e. running query_planner more than once) for different sort orders; that crashed and burned quickly thanks to the extent to which the planner assumes that it'll be run once only on any given input. Second was to generate a CTE for the input data. This didn't get very far because everything that already exists to handle CTE nodes assumes that they are explicit in the planner's input (that they have their own Query node, etc.) and I was not able to determine a good solution. If you have any suggestions for how to approach the problem, I'm happy to have another go at it. -- Andrew (irc:RhodiumToad) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers