Andrew Gierth <and...@tao11.riddles.org.uk> writes: > And here is that recut patch set.
I started looking over this patch, but eventually decided that it needs more work to be committable than I'm prepared to put in right now. My single biggest complaint is about the introduction of struct GroupedVar. If we stick with that, we're going to have to teach an extremely large number of places that know about Vars to also know about GroupedVars. This will result in code bloat and errors of omission. If you think the latter concern is hypothetical, note that you can't get 40 lines into gsp1.patch without finding such an omission, namely the patch fails to teach pg_stat_statements.c about GroupedVars. (That also points up that some of the errors of omission will be in third-party code that we can't fix easily.) I think you should get rid of that concept and instead implement the behavior by having nodeAgg.c set the relevant fields of the representative tuple slot to NULL, so that a regular Var does the right thing. I'm also not happy about the quality of the internal documentation. The big problem here is the seriously lacking documentation of the new parse node types, eg +/* + * Node representing substructure in GROUPING SETS + * + * This is not actually executable, but it's used in the raw parsetree + * representation of GROUP BY, and in the groupingSets field of Query, to + * preserve the original structure of rollup/cube clauses for readability + * rather than reducing everything to grouping sets. + */ + +typedef enum +{ + GROUPING_SET_EMPTY, + GROUPING_SET_SIMPLE, + GROUPING_SET_ROLLUP, + GROUPING_SET_CUBE, + GROUPING_SET_SETS +} GroupingSetKind; + +typedef struct GroupingSet +{ + Expr xpr; + GroupingSetKind kind; + List *content; + int location; +} GroupingSet; The only actual documentation there is a long-winded excuse for having put the struct declaration in the wrong place. (Since it's not an executable expression, it should be in parsenodes.h not primnodes.h.) Good luck figuring out what "content" is a list of, or indeed anything at all except that this has got something to do with grouping sets. If one digs around in the patch long enough, some useful information can be found in the header comments for various functions --- but there should be a spec for what this struct means, what its fields are, what the relevant invariants are *in the .h file*. Poking around in parsenodes.h, eg the description of SortGroupClause, should give you an idea of the standard here. I'm not too happy about struct Grouping either. If one had to guess, one would probably guess that this was part of the representation of a GROUP BY clause; a guess led on by the practice of the patch of dealing with this and struct GroupingSet together, as in eg pg_stat_statements.c and nodes.h. Reading enough of the patch will eventually clue you that this is the representation of a call of the GROUPING() pseudo-function, but that's not exactly clear from either the name of the struct or its random placement between Var and Const in primnodes.h. And the comment is oh so helpful: +/* + * Grouping + */ I'd be inclined to call it GroupingFunc and put it after Aggref/WindowFunc. Also please note that there is an attempt throughout the system to order code stanzas that deal with assorted node types in an order matching the order in which they're declared in the *nodes.h files. You should never be flipping a coin to decide where to add such code, and "put it at the end of the existing list" is usually not the best answer either. Some other random examples of inadequate attention to commenting: @@ -243,7 +243,7 @@ typedef struct AggStatePerAggData * rest. */ - Tuplesortstate *sortstate; /* sort object, if DISTINCT or ORDER BY */ + Tuplesortstate **sortstate; /* sort object, if DISTINCT or ORDER BY */ This change didn't even bother to pluralize the comment, let alone explain the length of the array or what it's indexed according to, let alone explain why we now need multiple tuplesort objects in what is still apparently a "per aggregate" state struct. (BTW, as a matter of good engineering I think it's useful to change a field's name when you change its meaning and representation so fundamentally. In this case, renaming to "sortstates" would have been clearer and would have helped ensure that you didn't miss fixing any referencing code.) @@ -338,81 +339,101 @@ static Datum GetAggInitVal(Datum textInitVal, Oid transtype); static void initialize_aggregates(AggState *aggstate, AggStatePerAgg peragg, - AggStatePerGroup pergroup) + AggStatePerGroup pergroup, + int numReinitialize) { int aggno; I wonder what numReinitialize is, or why it's needed, or (having read more code than I should have had to in order to guess at what it is) why it is that only the first N sortstates need to be reset. The comments at the call sites are no more enlightening. I don't really have any comments on the algorithms yet, having spent too much time trying to figure out underdocumented data structures to get to the algorithms. However, noting the addition of list_intersection_int() made me wonder whether you'd not be better off reducing the integer lists to bitmapsets a lot sooner, perhaps even at parse analysis. list_intersection_int() is going to be O(N^2) by nature. Maybe N can't get large enough to matter in this context, but I do see places that seem to be concerned about performance. I've not spent any real effort looking at gsp2.patch yet, but it seems even worse off comment-wise: if there's any explanation in there at all of what a "chained aggregate" is, I didn't find it. I'd also counsel you to find some other way to do it than putting bool chain_head fields in Aggref nodes; that looks like a mess, eg, it will break equal() tests for expression nodes that probably should still be seen as equal. I took a quick look at gsp-u.patch. It seems like that approach should work, with of course the caveat that using CUBE/ROLLUP as function names in a GROUP BY list would be problematic. I'm not convinced by the commentary in ruleutils.c suggesting that extra parentheses would help disambiguate: aren't extra parentheses still going to contain grouping specs according to the standard? Forcibly schema-qualifying such function names seems like a less fragile answer on that end. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers