On Fri, Jan 13, 2017 at 10:12 PM, Antonin Houska <a...@cybertec.at> wrote: > Ashutosh Bapat <ashutosh.ba...@enterprisedb.com> wrote: >> On Mon, Jan 9, 2017 at 11:26 PM, Antonin Houska <a...@cybertec.at> wrote: >> > Attached is a draft patch that lets partial aggregation happen at base >> > relation level. If the relations contain relatively small number of groups, >> > the number of input rows of the aggregation at the query level can be >> > reduced >> > this way. Also, if append relation and postgres_fdw planning is enhanced >> > accordingly, patch like this can let us aggregate individual tables on >> > remote >> > servers (e.g. shard nodes) and thus reduce the amount of rows subject to >> > the >> > final aggregation. > >> For an appendrel probably you need an ability to switch group->append into >> append->group > > Yes, like the new patch version (see attachment) does: > > postgres=# EXPLAIN (COSTS false) SELECT b.j, sum(a.x) FROM a JOIN b ON a.i = > b.j GROUP BY b.j; > QUERY PLAN > ------------------------------------------------------ > Finalize HashAggregate > Group Key: b.j > -> Hash Join > Hash Cond: (a.i = b.j) > -> Append > -> Partial HashAggregate > Group Key: a.i > -> Seq Scan on a > -> Partial HashAggregate > Group Key: a_1.i > -> Seq Scan on a_1 > -> Partial HashAggregate > Group Key: a_2.i > -> Seq Scan on a_2 > -> Hash > -> Gather > Workers Planned: 1 > -> Partial HashAggregate > Group Key: b.j > -> Parallel Seq Scan on b > >> For postgres_fdw, we already support aggregate pushdown. > > My understanding is that currently it only works if the whole query can be > evaluated by the FDW. What I try to do is to push down aggregation of > individual table, and join the partially-aggregated set with other tables, > which are not necessarily remote or reside on different remote server.
You will need to invoke FDW's hook for aggregate pushdown for the base relations. It would work as long as we don't ask it transient results. But I guess, that can come later. >> But we don't support fetching partial aggregates from foreign server. What >> other enhancements do you need? > > Here I try to introduce infrastructure for aggregation pushdown and > propagation of the transient aggregate state values from base relations to the > final join. postgres_fdw can benefit from it but it's not the only use case, > so I'd prefer adjusting it in a separate patch. > > Yes, an outstanding problem is that the remote nodes need to return transient > state values - probably using bytea type. I think this functionality should > even be separate from postgres_fdw (e.g. a new contrib module?), because the > remote nodes do not need postgres_fdw. Hmm, that's a missing piece. We need to work on it separately. > >> > A few "footnotes": >> > >> > >> > As for the example, the processing continues by joining the partially >> > grouped >> > sets: >> > >> > i | sum(x)| count(i.*) | j | count(j.*) >> > ---------------------------------------- >> > 1 | 7 | 2 | 1 | 2 > > [ Sorry, count(j.*) should be 2, not 3 as I wrote in the initial email. ] > >> > >> > >> > Before performing the final aggregation, we need to multiply sum(a.x) by >> > count(j.*) because w/o the aggregation at base relation level the input >> > of the query-level aggregation would look like >> > >> > a.i | a.x | b.j >> > ---------------- >> > 1 | 3 | 1 >> > 1 | 4 | 1 >> > 1 | 3 | 1 >> > 1 | 4 | 1 >> > >> > In other words, grouping of the base relation "b" below the join prevents >> > the >> > join from bringing per-group input set to the aggregate input multiple >> > times. To compensate for this effect, I've added a new field >> > "aggtransmultifn" >> > to pg_aggregate catalog. It "multiplies" the aggregate state w/o repeated >> > processing of the same input set many times. sum() is an example of an >> > aggregate that needs such processing, avg() is one that does not. >> >> For something like product aggregation, where the product (or higher order >> operations) across rows is accumulated instead of sum, mere multiplication >> wouldn't help. We will need some higher order operation to "extrapolate" the >> result based on count(j.*). In fact, the multiplication factor will depend >> upon the number of relations being joined E.g. select b.j, sum(a.x) where >> a, b, c where a.i = b.j and a.i = c.k group by b.j > > Maybe you're saying what I already try to do. Let me modify the example > accordingly (unlike the initial example, each table produces a group of > different size so the the count() values are harder to confuse): > [... snip ]] This all works well, as long as the aggregate is "summing" something across rows. The method doesn't work when aggregation is say "multiplying" across the rows or "concatenating" across the rows like array_agg() or string_agg(). They need a different strategy to combine aggregates across relations. > > I get exactly these values when I run your query on master branch w/o my > patch, so my theory could be correct :-) > >> May be we want to implement this technique without partial aggregation first >> i.e. push down aggregation and grouping down the join tree and then add >> partial aggregation steps. That might make it easy to review. Looking at >> the changes in create_plain_partial_paths(), it looks like we use this >> technique only in case of parallel query. I think the technique is useful >> otherwise as well. > IIUC, we are trying to solve multiple problems here: 1. Pushing down aggregates/groups down join tree, so that the number of rows to be joined decreases. This might be a good optimization to have. However there are problems in the current patch. Every path built for a relation (join or base) returns the same result expressed by the relation or its subset restricted by parameterization or unification. But this patch changes that. It creates paths which represent grouping in the base relation. I think, we need a separate relation to represent that result and hold paths which produce that result. That itself would be a sizable patch. 2. Try to push down aggregates based on the equivalence classes, where grouping properties can be transferred from one relation to the other using EC mechanism. This seems to require solving the problem of combining aggregates across the relations. But there might be some usecases which could benefit without solving this problem. 3. If the relation to which we push the aggregate is an append relation, push (partial) aggregation/grouping down into the child relations. - We don't do that right now even for grouping aggregation on a single append table. Parallel partial aggregation does that, but not exactly per relation. That may be a sizable project in itself. Even without this piece the rest of the optimizations proposed by this patch are important. 4. Additional goal: push down the aggregation to any relation (join/base) where it can be computed. If we break the problem down into smaller problems as above, 1. the resulting patches will be easier to review 2. Since those problems themselves produce some usable feature, there is a chance that more people will be interested in reviewing/testing/coding on those. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers