Re: Eager aggregation, take 3

2025-01-24 Thread Robert Haas
On Wed, Jan 22, 2025 at 1:48 AM Richard Guo wrote: > This approach would require injecting multiple intermediate > aggregation nodes into the path tree, for which we currently lack the > necessary architecture. As a result, I didn't pursue this idea > further. However, I'm really glad you mentio

Re: Eager aggregation, take 3

2025-01-21 Thread Richard Guo
On Wed, Jan 22, 2025 at 1:36 AM Robert Haas wrote: > Thanks for the example. What seems to be happening here is that each > of the three joins increases the number of rows by a multiple of > either 166 or 333. Aggregating reduces the number of rows to 3. I am > not sure that we should be too conce

Re: Eager aggregation, take 3

2025-01-21 Thread Robert Haas
On Tue, Jan 21, 2025 at 3:33 AM Richard Guo wrote: > I've been thinking about this proposal, and it's quite appealing. It > would significantly reduce both the planning effort and implementation > complexity, while still yielding reasonable planning results. > > One concern I have with this propo

Re: Eager aggregation, take 3

2025-01-21 Thread Richard Guo
On Tue, Jan 21, 2025 at 2:57 AM Tom Lane wrote: > However, a partial-aggregation path does not generate the same data > as an unaggregated path, no matter how fuzzy you are willing to be > about the concept. So I'm having a very hard time accepting that > it ought to be part of the same RelOptInf

Re: Eager aggregation, take 3

2025-01-21 Thread Richard Guo
On Tue, Jan 21, 2025 at 1:28 AM Robert Haas wrote: > On Sun, Jan 19, 2025 at 7:53 AM Richard Guo wrote: > > If, at last, the conclusion of this discussion is that we should not > > apply this change until we fix those problems in aggregate estimates, > > I'd be very sad. This conclusion is absol

Re: Eager aggregation, take 3

2025-01-20 Thread Robert Haas
On Mon, Jan 20, 2025 at 12:57 PM Tom Lane wrote: > However, a partial-aggregation path does not generate the same data > as an unaggregated path, no matter how fuzzy you are willing to be > about the concept. So I'm having a very hard time accepting that > it ought to be part of the same RelOptIn

Re: Eager aggregation, take 3

2025-01-20 Thread Tom Lane
Robert Haas writes: > So I don't quite know which way to jump here. It now seems to me that > we have three similar features with three different designs. > Parameterization added non-comparable paths to the same path list; > parallel query added them to a different path list in the same > RelOptI

Re: Eager aggregation, take 3

2025-01-20 Thread Robert Haas
On Sun, Jan 19, 2025 at 7:53 AM Richard Guo wrote: > If, at last, the conclusion of this discussion is that we should not > apply this change until we fix those problems in aggregate estimates, > I'd be very sad. This conclusion is absolutely correct, for sure, in > an ideal world, but in the rea

Re: Eager aggregation, take 3

2025-01-19 Thread Richard Guo
On Sat, Jan 18, 2025 at 6:16 AM Robert Haas wrote: > On Thu, Jan 16, 2025 at 3:18 AM Richard Guo wrote: > > If this t1/t2 join is part of a larger SELECT query, I think the cost > > estimates for the upper join nodes would likely be quite inaccurate. > > That's definitely true. However, the quest

Re: Eager aggregation, take 3

2025-01-17 Thread Robert Haas
On Thu, Jan 16, 2025 at 3:18 AM Richard Guo wrote: > If this t1/t2 join is part of a larger SELECT query, I think the cost > estimates for the upper join nodes would likely be quite inaccurate. That's definitely true. However, the question is not whether the planner has problems today (it definit

Re: Eager aggregation, take 3

2025-01-17 Thread Richard Guo
On Fri, Jan 17, 2025 at 6:40 AM Tom Lane wrote: > * The README addition, and the basically identical text in the > commit message, don't even provide a reason to believe that the > transformation is correct let alone that it will result in faster > execution. I don't understand why it's so hard t

Re: Eager aggregation, take 3

2025-01-16 Thread Tom Lane
I'm very sorry for not having had any time to look at this patch before --- it's been on my radar screen for awhile, but $LIFE has been rather demanding lately. Anyway, I've now read through the mail thread and portions of the v16 patch, and I have to concur with Robert's qualms about whether this

Re: Eager aggregation, take 3

2025-01-16 Thread Richard Guo
On Wed, Jan 15, 2025 at 11:40 PM Robert Haas wrote: > On Wed, Jan 15, 2025 at 1:58 AM Richard Guo wrote: > > I understand that we're currently quite bad at estimating the number > > of groups after aggregation. In fact, it's not just aggregation > > estimates — we're also bad at join estimates i

Re: Eager aggregation, take 3

2025-01-15 Thread Robert Haas
On Wed, Jan 15, 2025 at 1:58 AM Richard Guo wrote: > I understand that we're currently quite bad at estimating the number > of groups after aggregation. In fact, it's not just aggregation > estimates — we're also bad at join estimates in some cases. This is a > reality we have to face. Here's w

Re: Eager aggregation, take 3

2025-01-14 Thread Richard Guo
On Wed, Jan 15, 2025 at 12:07 AM Robert Haas wrote: > On Sun, Jan 12, 2025 at 9:04 PM Richard Guo wrote: > > Attached is an updated version of this patch that addresses Jian's > > review comments, along with some more cosmetic tweaks. I'm going to > > be looking at this patch again from the poin

Re: Eager aggregation, take 3

2025-01-14 Thread Robert Haas
On Sun, Jan 12, 2025 at 9:04 PM Richard Guo wrote: > Attached is an updated version of this patch that addresses Jian's > review comments, along with some more cosmetic tweaks. I'm going to > be looking at this patch again from the point of view of committing > it, with the plan to commit it late

Re: Eager aggregation, take 3

2025-01-09 Thread Richard Guo
On Thu, Jan 9, 2025 at 12:15 PM jian he wrote: > hi. > in create_grouping_expr_infos > > tce = lookup_type_cache(exprType((Node *) tle->expr), > TYPECACHE_BTREE_OPFAMILY); > if (!OidIsValid(tce->btree_opf) || > !OidIsValid(tce->btree_opin

Re: Eager aggregation, take 3

2025-01-08 Thread jian he
hi. in create_grouping_expr_infos tce = lookup_type_cache(exprType((Node *) tle->expr), TYPECACHE_BTREE_OPFAMILY); if (!OidIsValid(tce->btree_opf) || !OidIsValid(tce->btree_opintype)) return; /* *

Re: Eager aggregation, take 3

2024-12-16 Thread Richard Guo
On Wed, Dec 4, 2024 at 11:38 PM Robert Haas wrote: > On Sun, Nov 10, 2024 at 7:52 PM Richard Guo wrote: > > Hmm, currently we only consider grouped aggregation for eager > > aggregation. For grouped aggregation, the window function's > > arguments, as well as the PARTITION BY expressions, must a

Re: Eager aggregation, take 3

2024-12-04 Thread Robert Haas
On Sun, Nov 10, 2024 at 7:52 PM Richard Guo wrote: > Hmm, currently we only consider grouped aggregation for eager > aggregation. For grouped aggregation, the window function's > arguments, as well as the PARTITION BY expressions, must appear in the > GROUP BY clause. That is to say, the depname

Re: Eager aggregation, take 3

2024-11-11 Thread Richard Guo
On Tue, Nov 12, 2024 at 1:30 AM Robert Haas wrote: > On Sun, Nov 10, 2024 at 7:52 PM Richard Guo wrote: > > Hmm, currently we only consider grouped aggregation for eager > > aggregation. For grouped aggregation, the window function's > > arguments, as well as the PARTITION BY expressions, must a

Re: Eager aggregation, take 3

2024-11-11 Thread Robert Haas
On Sun, Nov 10, 2024 at 7:52 PM Richard Guo wrote: > > I have similar but weaker feelings about ordered aggregates. Consider: > > > > explain select t1.id, array_agg(t2.v order by t3.o) from t1, t2, t3 > > where t1.id = t2.id and t2.id = t3.id group by 1; > > > > It seems to me that a partially ag

Re: Eager aggregation, take 3

2024-11-10 Thread Richard Guo
On Wed, Nov 6, 2024 at 11:43 PM Robert Haas wrote: > On Wed, Nov 6, 2024 at 3:22 AM Richard Guo wrote: > > Yeah, ordered aggregates could be a blocker. I think it might be best > > to prevent the use of eager aggregation if root->numOrderedAggs > 0 > > for now. > > > > I've been thinking about t

Re: Eager aggregation, take 3

2024-11-06 Thread Robert Haas
On Wed, Nov 6, 2024 at 3:22 AM Richard Guo wrote: > Yeah, ordered aggregates could be a blocker. I think it might be best > to prevent the use of eager aggregation if root->numOrderedAggs > 0 > for now. > > I've been thinking about the window functions case, as Jian He also > mentioned it some ti

Re: Eager aggregation, take 3

2024-11-06 Thread Richard Guo
On Fri, Nov 1, 2024 at 9:42 PM Robert Haas wrote: > On Fri, Nov 1, 2024 at 2:21 AM Richard Guo wrote: > > ... an aggregated row from the partial > > aggregation matches the other side of the join if and only if each row > > in the partial group does, thereby ensuring that all rows in the same > >

Re: Eager aggregation, take 3

2024-11-05 Thread jian he
On Thu, Aug 29, 2024 at 10:26 AM Richard Guo wrote: > > > > 2. I think there might be techniques we could use to limit planning > > effort at an earlier stage when the approach doesn't appear promising. > > For example, if the proposed grouping column is already unique, the > > exercise is pointle

Re: Eager aggregation, take 3

2024-11-01 Thread Robert Haas
On Fri, Nov 1, 2024 at 2:21 AM Richard Guo wrote: > ... an aggregated row from the partial > aggregation matches the other side of the join if and only if each row > in the partial group does, thereby ensuring that all rows in the same > partial group have the same 'destiny'. Ah, I really like th

Re: Eager aggregation, take 3

2024-10-31 Thread Richard Guo
On Thu, Oct 31, 2024 at 9:16 PM jian he wrote: > > hi. > still trying to understand v13. found a bug. > > minimum test : > drop table if exists t1, t2; > CREATE TABLE t1 (a int, b int, c int); > CREATE TABLE t2 (a int, b int, c int); > SET enable_eager_aggregate TO on; > explain(costs off, setting

Re: Eager aggregation, take 3

2024-10-31 Thread Richard Guo
On Thu, Oct 31, 2024 at 12:25 AM Robert Haas wrote: > Well, the key thing here is the reasoning, which you don't really > spell out either here or in the patch. The patch's justification for > the use of BTEQUALIMAGE_PROC Is that, if we use non-equal-image > operators, "we may lose some informatio

Re: Eager aggregation, take 3

2024-10-31 Thread Richard Guo
On Wed, Oct 30, 2024 at 5:06 AM Robert Haas wrote: > On Tue, Sep 24, 2024 at 11:20 PM Richard Guo wrote: > > The reason is that it is very tricky to set the size estimates for a > > grouped join relation. For a non-grouped join relation, we know that > > all its paths have the same rowcount esti

Re: Eager aggregation, take 3

2024-10-31 Thread Richard Guo
On Tue, Oct 29, 2024 at 9:59 PM Robert Haas wrote: > On Wed, Sep 25, 2024 at 3:03 AM Richard Guo wrote: > > On Wed, Sep 11, 2024 at 10:52 AM Tender Wang wrote: > > > 1. In make_one_rel(), we have the below codes: > > > /* > > > * Build grouped base relations for each base rel if possible. > > >

Re: Eager aggregation, take 3

2024-10-31 Thread jian he
hi. still trying to understand v13. found a bug. minimum test : drop table if exists t1, t2; CREATE TABLE t1 (a int, b int, c int); CREATE TABLE t2 (a int, b int, c int); SET enable_eager_aggregate TO on; explain(costs off, settings) SELECT avg(t2.a), t1.c FROM t1 JOIN t2 ON t1.b = t2.b GROUP BY t

Re: Eager aggregation, take 3

2024-10-30 Thread Robert Haas
On Tue, Oct 29, 2024 at 3:51 AM Richard Guo wrote: > > 2. join type is FULL JOIN, (i am not sure about other Semijoins and > > anti-semijoins types). > > The presence of a FULL JOIN does not preclude the use of eager > aggregation. We still can push a partial aggregation down to a level > that i

Re: Eager aggregation, take 3

2024-10-30 Thread Robert Haas
On Tue, Oct 29, 2024 at 3:57 AM Richard Guo wrote: > On Fri, Oct 18, 2024 at 10:22 PM jian he wrote: > > So overall I doubt here BTEQUALIMAGE_PROC flag usage is correct. > > The BTEQUALIMAGE_PROC flag is used to prevent eager aggregation for > types whose equality operators do not imply bitwise e

Re: Eager aggregation, take 3

2024-10-30 Thread Robert Haas
On Sat, Oct 5, 2024 at 11:30 PM Richard Guo wrote: > Here’s a comparison of Execution Time and Planning Time for the seven > queries with eager aggregation disabled versus enabled (best of 3). > > Execution Time: > > EAGER-AGG-OFF EAGER-AGG-ON > > q4 105787.963 ms

Re: Eager aggregation, take 3

2024-10-29 Thread Robert Haas
On Tue, Sep 24, 2024 at 11:20 PM Richard Guo wrote: > The reason is that it is very tricky to set the size estimates for a > grouped join relation. For a non-grouped join relation, we know that > all its paths have the same rowcount estimate (well, in theory). But > this is not true for a groupe

Re: Eager aggregation, take 3

2024-10-29 Thread Robert Haas
On Wed, Sep 25, 2024 at 3:03 AM Richard Guo wrote: > On Wed, Sep 11, 2024 at 10:52 AM Tender Wang wrote: > > 1. In make_one_rel(), we have the below codes: > > /* > > * Build grouped base relations for each base rel if possible. > > */ > > setup_base_grouped_rels(root); > > > > As far as I know,

Re: Eager aggregation, take 3

2024-10-29 Thread Richard Guo
On Fri, Oct 18, 2024 at 12:44 PM jian he wrote: > 1. root->parse->resultRelation > 0 > just be 100% sure we are only dealing with SELECT, or we can add > Assert at the end of setup_eager_aggregation. Can GROUP BY clauses be used in INSERT/UPDATE/DELETE/MERGE statements? If not, I think there

Re: Eager aggregation, take 3

2024-10-29 Thread Richard Guo
On Fri, Oct 18, 2024 at 10:22 PM jian he wrote: > So overall I doubt here BTEQUALIMAGE_PROC flag usage is correct. The BTEQUALIMAGE_PROC flag is used to prevent eager aggregation for types whose equality operators do not imply bitwise equality, such as NUMERIC. After a second thought, I think it

Re: Eager aggregation, take 3

2024-10-18 Thread jian he
/* * Eager aggregation is only possible if equality of grouping keys, as * defined by the equality operator, implies bitwise equality. * Otherwise, if we put keys with different byte images into the same * group, we may lose some information that could be

Re: Eager aggregation, take 3

2024-10-17 Thread jian he
On Sat, Oct 5, 2024 at 6:23 PM Richard Guo wrote: > > On Fri, Sep 27, 2024 at 11:53 AM Richard Guo wrote: > > Here is an updated version of this patch that fixes the rowcount > > estimate issue along this routine. (see set_joinpath_size.) > in the function setup_eager_aggregation, can we be mor

Re: Eager aggregation, take 3

2024-10-05 Thread Richard Guo
On Sat, Oct 5, 2024 at 6:23 PM Richard Guo wrote: > > On Fri, Sep 27, 2024 at 11:53 AM Richard Guo wrote: > > Here is an updated version of this patch that fixes the rowcount > > estimate issue along this routine. (see set_joinpath_size.) > > I have worked on inventing some heuristics to limit th

Re: Eager aggregation, take 3

2024-09-25 Thread Richard Guo
On Fri, Sep 13, 2024 at 3:48 PM Tender Wang wrote: > Since MERGE/SPLIT partition has been reverted, the tests *partition_merge* > and *partition_split* should be removed > from parallel_schedule. After doing the above, the 0002 patch can be applied. Yeah, that's what I need to do. Thanks Ric

Re: Eager aggregation, take 3

2024-09-25 Thread Richard Guo
On Wed, Sep 11, 2024 at 10:52 AM Tender Wang wrote: > 1. In make_one_rel(), we have the below codes: > /* > * Build grouped base relations for each base rel if possible. > */ > setup_base_grouped_rels(root); > > As far as I know, each base rel only has one grouped base relation, if > possible. >

Re: Eager aggregation, take 3

2024-09-24 Thread Richard Guo
On Thu, Sep 5, 2024 at 9:40 AM Tender Wang wrote: > 1. in setup_eager_aggregation(), before calling create_agg_clause_infos(), > it does > some checks if eager aggregation is available. Can we move those checks into > a function, > for example, can_eager_agg(), like can_partial_agg() does? We

Re: Eager aggregation, take 3

2024-09-24 Thread Richard Guo
On Wed, Aug 28, 2024 at 9:01 PM Robert Haas wrote: > On Tue, Aug 27, 2024 at 11:57 PM Tender Wang wrote: > > I haven't look all of them. I just pick few simple plan test(e.g. 19.sql, > > 45.sql). > > For example, 19.sql, eager agg pushdown doesn't get large gain, but a little > > performance reg

Re: Eager aggregation, take 3

2024-09-13 Thread Tender Wang
Tender Wang 于2024年9月4日周三 11:48写道: > > > Richard Guo 于2024年8月21日周三 15:11写道: > >> On Fri, Aug 16, 2024 at 4:14 PM Richard Guo >> wrote: >> > I had a self-review of this patchset and made some refactoring, >> > especially to the function that creates the RelAggInfo structure for a >> > given relat

Re: Eager aggregation, take 3

2024-09-10 Thread Tender Wang
Richard Guo 于2024年8月21日周三 15:11写道: > On Fri, Aug 16, 2024 at 4:14 PM Richard Guo > wrote: > > I had a self-review of this patchset and made some refactoring, > > especially to the function that creates the RelAggInfo structure for a > > given relation. While there were no major changes, the cod

Re: Eager aggregation, take 3

2024-09-04 Thread Tender Wang
Richard Guo 于2024年8月21日周三 15:11写道: > On Fri, Aug 16, 2024 at 4:14 PM Richard Guo > wrote: > > I had a self-review of this patchset and made some refactoring, > > especially to the function that creates the RelAggInfo structure for a > > given relation. While there were no major changes, the cod

Re: Eager aggregation, take 3

2024-09-03 Thread Tender Wang
Richard Guo 于2024年8月21日周三 15:11写道: > On Fri, Aug 16, 2024 at 4:14 PM Richard Guo > wrote: > > I had a self-review of this patchset and made some refactoring, > > especially to the function that creates the RelAggInfo structure for a > > given relation. While there were no major changes, the cod

Re: Eager aggregation, take 3

2024-08-29 Thread Robert Haas
On Wed, Aug 28, 2024 at 11:38 PM Tender Wang wrote: > I upload EXPLAIN(COSTS ON, ANALYZE) test to [1]. > I ran the same query three times, and I chose the third time result. > You can check 19_off_explain.out and 19_on_explain.out. So, in 19_off_explain.out, we got this: -> Finalize Gr

Re: Eager aggregation, take 3

2024-08-29 Thread Robert Haas
On Wed, Aug 28, 2024 at 10:26 PM Richard Guo wrote: > Yeah, I'm concerned about this too. In addition to the inaccuracies > in aggregation estimates, our estimates for joins are sometimes not > very accurate either. All this are likely to result in regressions > with eager aggregation in some ca

Re: Eager aggregation, take 3

2024-08-28 Thread Tender Wang
Richard Guo 于2024年8月29日周四 10:46写道: > On Wed, Aug 28, 2024 at 9:01 PM Robert Haas wrote: > > On Tue, Aug 27, 2024 at 11:57 PM Tender Wang wrote: > > > I haven't look all of them. I just pick few simple plan test(e.g. > 19.sql, 45.sql). > > > For example, 19.sql, eager agg pushdown doesn't get la

Re: Eager aggregation, take 3

2024-08-28 Thread Tender Wang
Richard Guo 于2024年8月29日周四 10:46写道: > On Wed, Aug 28, 2024 at 9:01 PM Robert Haas wrote: > > On Tue, Aug 27, 2024 at 11:57 PM Tender Wang wrote: > > > I haven't look all of them. I just pick few simple plan test(e.g. > 19.sql, 45.sql). > > > For example, 19.sql, eager agg pushdown doesn't get la

Re: Eager aggregation, take 3

2024-08-28 Thread Richard Guo
On Wed, Aug 28, 2024 at 9:01 PM Robert Haas wrote: > On Tue, Aug 27, 2024 at 11:57 PM Tender Wang wrote: > > I haven't look all of them. I just pick few simple plan test(e.g. 19.sql, > > 45.sql). > > For example, 19.sql, eager agg pushdown doesn't get large gain, but a little > > performance reg

Re: Eager aggregation, take 3

2024-08-28 Thread Richard Guo
On Wed, Aug 28, 2024 at 11:57 AM Tender Wang wrote: > Rectenly, I do some benchmark tests, mainly on tpch and tpcds. > tpch tests have no plan diff, so I do not continue to test on tpch. > tpcds(10GB) tests have 22 plan diff as below: > 4.sql, 5.sql, 8.sql,11.sql,19.sql,23.sql,31.sql, > 33.sql,39

Re: Eager aggregation, take 3

2024-08-28 Thread Richard Guo
On Fri, Aug 23, 2024 at 11:59 PM Robert Haas wrote: > Here are some initial, high-level thoughts about this patch set. Thank you for your review and feedback! It helps a lot in moving this work forward. > 1. As far as I can see, there's no real performance testing on this > thread. I expect tha

Re: Eager aggregation, take 3

2024-08-28 Thread Robert Haas
On Tue, Aug 27, 2024 at 11:57 PM Tender Wang wrote: > Rectenly, I do some benchmark tests, mainly on tpch and tpcds. > tpch tests have no plan diff, so I do not continue to test on tpch. Interesting to know. > tpcds(10GB) tests have 22 plan diff as below: > 4.sql, 5.sql, 8.sql,11.sql,19.sql,23.s

Re: Eager aggregation, take 3

2024-08-27 Thread Tender Wang
Richard Guo 于2024年8月21日周三 15:11写道: > On Fri, Aug 16, 2024 at 4:14 PM Richard Guo > wrote: > > I had a self-review of this patchset and made some refactoring, > > especially to the function that creates the RelAggInfo structure for a > > given relation. While there were no major changes, the cod

Re: Eager aggregation, take 3

2024-08-23 Thread Robert Haas
On Wed, Aug 21, 2024 at 3:11 AM Richard Guo wrote: > Attached is the updated version of the patchset that fixes this bug > and includes further code refactoring. Here are some initial, high-level thoughts about this patch set. 1. As far as I can see, there's no real performance testing on this t

Re: Eager aggregation, take 3

2024-07-11 Thread Paul George
Hey Richard, Looking more closely at this example >select t2.a, count(*) from t t1 left join t t2 on t2.b > 1 group by t2.a having t2.a is null; I wonder if the inability to exploit eager aggregation is more based on the fact that COUNT(*) cannot be decomposed into an aggregation of PARTIAL COUN

Re: Eager aggregation, take 3

2024-07-10 Thread Richard Guo
On Sun, Jul 7, 2024 at 10:45 AM Paul George wrote: > Thanks for reviving this patch and for all of your work on it! Eager > aggregation pushdown will be beneficial for my work and I'm hoping to see it > land. Thanks for looking at this patch! > The output of both the original query and this on

Re: Eager aggregation, take 3

2024-07-06 Thread Paul George
Richard: Thanks for reviving this patch and for all of your work on it! Eager aggregation pushdown will be beneficial for my work and I'm hoping to see it land. I was playing around with v9 of the patches and was specifically curious about this previous statement... >This patch also makes eager

Re: Eager aggregation, take 3

2024-03-04 Thread Andy Fan
Richard Guo writes: > Hi All, > > Eager aggregation is a query optimization technique that partially > pushes a group-by past a join, and finalizes it once all the relations > are joined. Eager aggregation reduces the number of input rows to the > join and thus may result in a better overall p