On Sat, Oct 5, 2024 at 6:23 PM Richard Guo <guofengli...@gmail.com> wrote: > > On Fri, Sep 27, 2024 at 11:53 AM Richard Guo <guofengli...@gmail.com> wrote: > > Here is an updated version of this patch that fixes the rowcount > > estimate issue along this routine. (see set_joinpath_size.) > > I have worked on inventing some heuristics to limit the planning > effort of eager aggregation. One simple yet effective approach I'm > thinking of is to consider a grouped path as NOT useful if its row > reduction ratio falls below a predefined minimum threshold. Currently > I'm using 0.5 as the threshold, but I'm open to other values.
I ran the TPC-DS benchmark at scale 10 and observed eager aggregation applied in several queries, including q4, q8, q11, q23, q31, q33, and q77. Notably, the regression in q19 that Tender identified with v11 has disappeared in v13. Here’s a comparison of Execution Time and Planning Time for the seven queries with eager aggregation disabled versus enabled (best of 3). Execution Time: EAGER-AGG-OFF EAGER-AGG-ON q4 105787.963 ms 34807.938 ms q8 1407.454 ms 1654.923 ms q11 67899.213 ms 18670.086 ms q23 45945.849 ms 42990.652 ms q31 10463.536 ms 10244.175 ms q33 2186.928 ms 2217.228 ms q77 2360.565 ms 2416.674 ms Planning Time: EAGER-AGG-OFF EAGER-AGG-ON q4 2.334 ms 2.602 ms q8 0.685 ms 0.647 ms q11 0.935 ms 1.094 ms q23 2.666 ms 2.582 ms q31 1.051 ms 1.206 ms q33 1.248 ms 1.796 ms q77 0.967 ms 0.962 ms There are good performance improvements in q4 and q11 (3~4 times). For the other queries, execution times remain largely unchanged, falling within the margin of error, with no notable regressions observed. For the planning time, I do not see notable regressions for any of the seven queries. It seems that the new cost estimates and the new heuristic are working pretty well. Thanks Richard