Re: [HACKERS] Final Patch for GROUPING SETS

Andrew Gierth Wed, 10 Dec 2014 18:20:07 -0800

>>>>> "Tom" == Tom Lane <[email protected]> writes:


More comment on this later, but I want to highlight these specific
points since we need clear answers here to avoid wasting unnecessary
time and effort:

 Tom> I've not spent any real effort looking at gsp2.patch yet, but it
 Tom> seems even worse off comment-wise: if there's any explanation in
 Tom> there at all of what a "chained aggregate" is, I didn't find it.

(Maybe "stacked" would have been a better term.)

What that code does is produce plans that look like this:

  GroupAggregate
  -> Sort
     -> ChainAggregate
        -> Sort
           -> ChainAggregate

in much the same way that WindowAgg nodes are generated.

Where would you consider the best place to comment this? The WindowAgg
equivalent seems to be discussed primarily in the header comment of
nodeWindowAgg.c.

 Tom> I'd also counsel you to find some other way to do it than
 Tom> putting bool chain_head fields in Aggref nodes;

There are no chain_head fields in Aggref nodes.

Agg.chain_head is true for the Agg node at the top of the chain (the
GroupAggregate node in the above example), while AggState.chain_head
is set on the ChainAggregate nodes to point to the AggState of the
GroupAggregate node.

What we need to know before doing any further work on this is whether
this idea of stacking up aggregate and sort nodes is a viable one.

(The feedback I've had so far suggests that the performance is
acceptable, even if there are still optimization opportunities that
can be tackled later, like adding HashAggregate support.)

 Tom> I took a quick look at gsp-u.patch.  It seems like that approach
 Tom> should work, with of course the caveat that using CUBE/ROLLUP as
 Tom> function names in a GROUP BY list would be problematic.  I'm not
 Tom> convinced by the commentary in ruleutils.c suggesting that extra
 Tom> parentheses would help disambiguate: aren't extra parentheses
 Tom> still going to contain grouping specs according to the standard?

The spec is of minimal help here since it does not allow expressions in
GROUP BY at all, last I looked; only column references.

The extra parens do actually disambiguate because CUBE(x) and
(CUBE(x)) are not equivalent anywhere; while CUBE(x) can appear inside
GROUPING SETS (...), it cannot appear inside a (...) list nested inside
a GROUPING SETS list (or anywhere else).

As the comments in gram.y explain, the productions used are intended
to follow the spec with the exception of using a_expr where the spec
requires <ordinary grouping set>. So CUBE and ROLLUP are recognized as
special only as part of a group_by_item (<grouping element> in the
spec), and as soon as we see a paren that isn't part of the "GROUPING
SETS (" opener, we're forced into parsing an a_expr, in which CUBE()
would become a function call.

(The case of upgrading from an old pg version seems to require the use
of --quote-all-identifiers in pg_dump)

 Tom> Forcibly schema-qualifying such function names seems like a less
 Tom> fragile answer on that end.

That I guess would require keeping more state, unless you applied it
everywhere to any function with a keyword for a name? I dunno.

The question that needs deciding here is less whether the approach
_could_ work but whether we _want_ it. The objection has been made
that we are in effect introducing a new category of "unreserved almost
everywhere" keyword, which I think has a point; on the other hand,
reserving CUBE is a seriously painful prospect.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Final Patch for GROUPING SETS

Reply via email to