On Fri, Jul 12, 2019 at 4:42 PM Antonin Houska <a...@cybertec.at> wrote:
> Richard Guo <ri...@pivotal.io> wrote: > > > I didn't fully follow the whole thread and mainly looked into the > > latest > > patch set. So what are the considerations for abandoning the > > aggmultifn > > concept? > > Originally the function was there to support join where both relations are > grouped. My doubts about usefulness of such join started at [1]. (See the > thread referenced from [2].) > > > In my opinion, aggmultifn would enable us to do a lot more > > types of transformation. For example, consider the query below: > > > > select sum(foo.c) from foo join bar on foo.b = bar.b group by foo.a, > > bar.a; > > > > With the latest patch, the plan looks like: > > > > Finalize HashAggregate <------ sum(psum) > > Group Key: foo.a, bar.a > > -> Hash Join > > Hash Cond: (bar.b = foo.b) > > -> Seq Scan on bar > > -> Hash > > -> Partial HashAggregate <------ sum(foo.c) as > > psum > > Group Key: foo.a, foo.b > > -> Seq Scan on foo > > > > > > If we have aggmultifn, we can perform the query this way: > > > > Finalize HashAggregate <------ sum(foo.c)*cnt > > Group Key: foo.a, bar.a > > -> Hash Join > > Hash Cond: (foo.b = bar.b) > > -> Seq Scan on foo > > -> Hash > > -> Partial HashAggregate <------ count(*) as cnt > > Group Key: bar.a, bar.b > > -> Seq Scan on bar > > Perhaps, but it that would require changes to nodeAgg.c, which I want to > avoid > for now. > > > And this way: > > > > Finalize HashAggregate <------ sum(psum)*cnt > > Group Key: foo.a, bar.a > > -> Hash Join > > Hash Cond: (foo.b = bar.b) > > -> Partial HashAggregate <------ sum(foo.c) as > > psum > > Group Key: foo.a, foo.b > > -> Seq Scan on foo > > -> Hash > > -> Partial HashAggregate <------ count(*) as cnt > > Group Key: bar.a, bar.b > > -> Seq Scan on bar > > This looks like my idea presented in [1], for which there seems to be no > common use case. > > > My another question is in function add_grouped_path(), when creating > > sorted aggregation path on top of subpath. If the subpath is not > > sorted, > > then the sorted aggregation path would not be generated. Why not in > > this > > case we create a sort path on top of subpath first and then create > > group > > aggregation path on top of the sort path? > > I see no reason not to do it. (If you want to try, just go ahead.) However > the > current patch version brings only the basic functionality and I'm not > going to > add new functionality (such as parallel aggregation, partitioned tables or > postgres_fdw) until the open design questions are resolved. Otherwise > there's > a risk that the additional work will be wasted due to major rework of the > core > functionality. > > > Core dump when running one query in agg_pushdown.sql > > Thanks for the report! Fixed in the new version. > Another core dump for query below: select sum(t1.s1) from t1, t2, t3, t4 where t1.j1 = t2.j2 group by t1.g1, t2.g2; This is due to a small mistake: diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c index 10becc0..9c2deed 100644 --- a/src/backend/optimizer/util/relnode.c +++ b/src/backend/optimizer/util/relnode.c @@ -648,7 +648,7 @@ void add_grouped_rel(PlannerInfo *root, RelOptInfo *rel, RelAggInfo *agg_info) { add_rel_info(root->grouped_rel_list, rel); - add_rel_info(root->agg_info_list, rel); + add_rel_info(root->agg_info_list, agg_info); } Thanks Richard