On 10 January 2017 at 06:56, Antonin Houska <a...@cybertec.at> wrote: > Before performing the final aggregation, we need to multiply sum(a.x) by > count(j.*) because w/o the aggregation at base relation level the input > of the query-level aggregation would look like > > a.i | a.x | b.j > ---------------- > 1 | 3 | 1 > 1 | 4 | 1 > 1 | 3 | 1 > 1 | 4 | 1 > > In other words, grouping of the base relation "b" below the join prevents the > join from bringing per-group input set to the aggregate input multiple > times. To compensate for this effect, I've added a new field "aggtransmultifn" > to pg_aggregate catalog. It "multiplies" the aggregate state w/o repeated > processing of the same input set many times. sum() is an example of an > aggregate that needs such processing, avg() is one that does not.
First off, I'd like to say that making improvements in this area sounds like a great thing. I'm very keen to see progress here. I've been thinking about this aggtransmultifn and I'm not sure if it's really needed. Adding a whole series of new transition functions is quite a pain. At least I think so, and I have a feeling Robert might agree with me. Let's imagine some worst case (and somewhat silly) aggregate query: SELECT count(*) FROM million_row_table CROSS JOIN another_million_row_table; Today that's going to cause 1 TRILLION transitions! Performance will be terrible. If we pushed the aggregate down into one of those tables and performed a Partial Aggregate on that, then a Finalize Aggregate on that single row result (after the join), then that's 1 million transfn calls, and 1 million combinefn calls, one for each row produced by the join. If we did it your way (providing I understand your proposal correctly) there's 1 million transfn calls on one relation, then 1 million on the other and then 1 multiplyfn call. which does 1000000 * 1000000 What did we save vs. using the existing aggcombinefn infrastructure which went into 9.6? Using this actually costs us 1 extra function call, right? I'd imagine the size of the patch to use aggcombinefn instead would be a fraction of the size of the one which included all the new aggmultiplyfns and pg_aggregate.h changes. There's already a lot of infrastructure in there to help you detect when this optimisation can be applied. For example, AggClauseCosts.hasNonPartial will be true if any aggregates don't have a combinefn, or if there's any DISTINCT or ORDER BY aggregates, which'll also mean you can't apply the optimisation. It sounds like a much more manageable project by using aggcombinefn instead. Then maybe one day when we can detect if a join did not cause any result duplication (i.e Unique Joins), we could finalise the aggregates on the first call, and completely skip the combine state altogether. Thanks for your work on this. Regards David Rowley -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers