Re: [HACKERS] WIP: Aggregation push-down

Richard Guo Mon, 15 Jul 2019 03:29:31 -0700

On Fri, Jul 12, 2019 at 4:42 PM Antonin Houska <[email protected]> wrote:


> Richard Guo <[email protected]> wrote:
>
> > I didn't fully follow the whole thread and mainly looked into the
> > latest
> > patch set. So what are the considerations for abandoning the
> > aggmultifn
> > concept?
>
> Originally the function was there to support join where both relations are
> grouped. My doubts about usefulness of such join started at [1]. (See the
> thread referenced from [2].)
>
> > In my opinion, aggmultifn would enable us to do a lot more
> > types of transformation. For example, consider the query below:
> >
> > select sum(foo.c) from foo join bar on foo.b = bar.b group by foo.a,
> > bar.a;
> >
> > With the latest patch, the plan looks like:
> >
> > Finalize HashAggregate    <------ sum(psum)
> >    Group Key: foo.a, bar.a
> >    ->  Hash Join
> >          Hash Cond: (bar.b = foo.b)
> >          ->  Seq Scan on bar
> >          ->  Hash
> >                ->  Partial HashAggregate    <------ sum(foo.c) as
> > psum
> >                      Group Key: foo.a, foo.b
> >                      ->  Seq Scan on foo
> >
> >
> > If we have aggmultifn, we can perform the query this way:
> >
> > Finalize HashAggregate    <------ sum(foo.c)*cnt
> >    Group Key: foo.a, bar.a
> >    ->  Hash Join
> >          Hash Cond: (foo.b = bar.b)
> >          ->  Seq Scan on foo
> >          ->  Hash
> >                ->  Partial HashAggregate    <------ count(*) as cnt
> >                      Group Key: bar.a, bar.b
> >                      ->  Seq Scan on bar
>
> Perhaps, but it that would require changes to nodeAgg.c, which I want to
> avoid
> for now.
>
> > And this way:
> >
> > Finalize HashAggregate    <------ sum(psum)*cnt
> >    Group Key: foo.a, bar.a
> >    ->  Hash Join
> >          Hash Cond: (foo.b = bar.b)
> >                ->  Partial HashAggregate    <------ sum(foo.c) as
> > psum
> >                      Group Key: foo.a, foo.b
> >                      ->  Seq Scan on foo
> >          ->  Hash
> >                ->  Partial HashAggregate    <------ count(*) as cnt
> >                      Group Key: bar.a, bar.b
> >                      ->  Seq Scan on bar
>
> This looks like my idea presented in [1], for which there seems to be no
> common use case.
>
> > My another question is in function add_grouped_path(), when creating
> > sorted aggregation path on top of subpath. If the subpath is not
> > sorted,
> > then the sorted aggregation path would not be generated. Why not in
> > this
> > case we create a sort path on top of subpath first and then create
> > group
> > aggregation path on top of the sort path?
>
> I see no reason not to do it. (If you want to try, just go ahead.) However
> the
> current patch version brings only the basic functionality and I'm not
> going to
> add new functionality (such as parallel aggregation, partitioned tables or
> postgres_fdw) until the open design questions are resolved. Otherwise
> there's
> a risk that the additional work will be wasted due to major rework of the
> core
> functionality.
>
> > Core dump when running one query in agg_pushdown.sql
>
> Thanks for the report! Fixed in the new version.
>

Another core dump for query below:

select sum(t1.s1) from t1, t2, t3, t4 where t1.j1 = t2.j2 group by t1.g1,
t2.g2;

This is due to a small mistake:

diff --git a/src/backend/optimizer/util/relnode.c
b/src/backend/optimizer/util/relnode.c
index 10becc0..9c2deed 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -648,7 +648,7 @@ void
 add_grouped_rel(PlannerInfo *root, RelOptInfo *rel, RelAggInfo *agg_info)
 {
        add_rel_info(root->grouped_rel_list, rel);
-       add_rel_info(root->agg_info_list, rel);
+       add_rel_info(root->agg_info_list, agg_info);
 }

Thanks
Richard

Re: [HACKERS] WIP: Aggregation push-down

Reply via email to