Andrew Gierth <and...@tao11.riddles.org.uk> writes: > "Tom" == Tom Lane <t...@sss.pgh.pa.us> writes: > Tom> I've not spent any real effort looking at gsp2.patch yet, but it > Tom> seems even worse off comment-wise: if there's any explanation in > Tom> there at all of what a "chained aggregate" is, I didn't find it.
> (Maybe "stacked" would have been a better term.) > What that code does is produce plans that look like this: > GroupAggregate > -> Sort > -> ChainAggregate > -> Sort > -> ChainAggregate > in much the same way that WindowAgg nodes are generated. That seems pretty messy, especially given your further comments that these plan nodes are interconnected and know about each other (though you failed to say exactly how). The claimed analogy to WindowAgg therefore seems bogus since stacked WindowAggs are independent, AFAIR anyway. I'm also wondering about performance: doesn't this imply more rows passing through some of the plan steps than really necessary? Also, how would this extend to preferring hashed aggregation in some of the grouping steps? ISTM that maybe what we should do is take a cue from the SQL spec, which defines these things in terms of UNION ALL of plain-GROUP-BY operations reading from a common CTE. Abstractly, that is, we'd have Append -> GroupAggregate -> Sort -> source data -> GroupAggregate -> Sort -> source data -> GroupAggregate -> Sort -> source data ... (or some of the arms could be HashAgg without a sort). Then the question is what exactly the aggregates are reading from. We could do worse than make it a straight CTE, I suppose. > Tom> I'd also counsel you to find some other way to do it than > Tom> putting bool chain_head fields in Aggref nodes; > There are no chain_head fields in Aggref nodes. Oh, I mistook "struct Agg" for "struct Aggref". (That's another pretty poorly chosen struct name, though I suppose it's far too late to change that choice.) Still, interconnecting plan nodes that aren't adjacent in the plan tree doesn't sound like a great idea to me. > Tom> I took a quick look at gsp-u.patch. It seems like that approach > Tom> should work, with of course the caveat that using CUBE/ROLLUP as > Tom> function names in a GROUP BY list would be problematic. I'm not > Tom> convinced by the commentary in ruleutils.c suggesting that extra > Tom> parentheses would help disambiguate: aren't extra parentheses > Tom> still going to contain grouping specs according to the standard? > The extra parens do actually disambiguate because CUBE(x) and > (CUBE(x)) are not equivalent anywhere; while CUBE(x) can appear inside > GROUPING SETS (...), it cannot appear inside a (...) list nested inside > a GROUPING SETS list (or anywhere else). Maybe, but this seems very fragile and non-future-proof. I think double-quoting or schema-qualifying such function names would be safer when you think about the use-case of dumping views that may get loaded into future Postgres versions. > The question that needs deciding here is less whether the approach > _could_ work but whether we _want_ it. The objection has been made > that we are in effect introducing a new category of "unreserved almost > everywhere" keyword, which I think has a point; True, but I think that ship has already sailed. We already have similar behavior for PARTITION, RANGE, and ROWS (see the opt_existing_window_name production), and I think PRECEDING, FOLLOWING, and UNBOUNDED are effectively reserved-in-certain-very-specific-contexts as well. And there are similar behaviors in plpgsql's parser. > on the other hand, > reserving CUBE is a seriously painful prospect. Precisely. I think renaming or getting rid of contrib/cube would have to be something done in a staged fashion over multiple release cycles. Waiting several years to get GROUPING SETS doesn't seem appealing at all compared to this alternative. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers