Hi Ashutosh, On Tue, Feb 25, 2025 at 8:04 PM Ashutosh Bapat <ashutosh.bapat....@gmail.com> wrote: > On Thu, Feb 20, 2025 at 5:28 PM Ashutosh Bapat <ashutosh.bapat....@gmail.com> > wrote: > > On Tue, Feb 4, 2025 at 4:07 PM Ashutosh Bapat > > <ashutosh.bapat....@gmail.com> wrote: > > > If we are not interested in saving memory, there is a simpler way to > > > improve planning time by adding a hash table per equivalence class to > > > store the derived clauses, instead of a linked list, when the number > > > of derived clauses is higher than a threshold (say 32 same as the > > > threshold for join_rel_list. Maybe that approach will yield stable > > > planning time. > > I implemented the above idea in attached patches. I also added the > following query, inspired from Alvaro's query, to summarise the > results. > with master_avgs as > (select code_tag, num_parts, num_joins, pwj, avg(planning_time_ms) > avg_pt, stddev(planning_time_ms) stddev_pt > from msmts where code_tag = 'master' > group by code_tag, num_parts, num_joins, pwj), > patched_avgs as > (select code_tag, num_parts, num_joins, pwj, avg(planning_time_ms) > avg_pt, stddev(planning_time_ms) stddev_pt > from msmts where code_tag = 'patched' > group by code_tag, num_parts, num_joins, pwj) > select num_parts, > num_joins, > format('s=%s%% md=%s%% pd=%s%%', > ((m.avg_pt - p.avg_pt)/m.avg_pt * 100)::numeric(6, 2), > (m.stddev_pt/m.avg_pt * 100)::numeric(6, 2), > (p.stddev_pt/p.avg_pt * 100)::numeric(6, 2)) > from master_avgs m join patched_avgs p using (num_parts, num_joins, > pwj) where not pwj order by 1, 2, 3; > \crosstabview 1 2 3 > > not pwj in the last line should be changed to pwj to get results with > enable_partitionwise_join = true. > > With the attached patches, I observe following results > > With PWJ disabled > num_parts | 2 | 3 > | 4 | 5 > -----------+-------------------------------+------------------------------+----------------------------+----------------------------- > 0 | s=-4.44% md=17.91% pd=23.05% | s=-0.83% md=11.10% > pd=19.58% | s=0.87% md=4.04% pd=7.91% | s=-35.24% md=7.63% pd=9.69% > 10 | s=30.13% md=118.18% pd=37.44% | s=-3.49% md=0.58% > pd=0.49% | s=-0.83% md=0.29% pd=0.35% | s=-0.24% md=0.35% pd=0.32% > 100 | s=1.94% md=13.19% pd=4.08% | s=-0.27% md=0.18% > pd=0.44% | s=7.04% md=3.05% pd=3.11% | s=12.75% md=1.69% pd=0.81% > 500 | s=4.39% md=1.71% pd=1.33% | s=10.17% md=1.28% > pd=1.90% | s=23.04% md=0.24% pd=0.58% | s=30.87% md=0.30% pd=1.11% > 1000 | s=4.27% md=1.21% pd=1.97% | s=13.97% md=0.44% > pd=0.79% | s=24.05% md=0.63% pd=1.02% | s=30.77% md=0.77% pd=0.17% > > > Each cell is a triple (s, md, pd) where s is improvement in planning > time using the patches in % as compared to the master (higher the > better), md = standard deviation as % of the average planning time on > master, pd = is standard deviation as % of the average planning time > with patches. > > With PWJ enabled > num_parts | 2 | 3 > | 4 | 5 > -----------+------------------------------+------------------------------+-----------------------------+------------------------------ > 0 | s=-94.25% md=6.98% pd=56.03% | s=44.10% md=141.13% > pd=9.32% | s=42.71% md=46.00% pd=6.55% | s=-26.12% md=6.72% pd=15.20% > 10 | s=-25.89% md=4.29% pd=63.75% | s=-1.34% md=3.15% pd=3.26% > | s=0.31% md=4.13% pd=4.34% | s=-1.34% md=3.10% pd=6.73% > 100 | s=-2.83% md=0.94% pd=1.31% | s=-2.17% md=4.57% pd=4.41% > | s=0.98% md=1.59% pd=1.81% | s=1.87% md=1.10% pd=0.79% > 500 | s=1.57% md=3.01% pd=1.70% | s=6.99% md=1.58% pd=1.68% > | s=11.11% md=0.24% pd=0.62% | s=11.65% md=0.18% pd=0.90% > 1000 | s=3.59% md=0.98% pd=1.78% | s=10.83% md=0.88% pd=0.46% > | s=15.62% md=0.46% pd=0.13% | s=16.38% md=0.63% pd=0.29% > > Same numbers measured for previous set of patches [1], which improves > both memory consumption as well as planning time. > > With PWJ disabled > num_parts | 2 | 3 > | 4 | 5 > -----------+-------------------------------+------------------------------+------------------------------+-------------------------------- > 0 | s=4.68% md=18.17% pd=22.09% | s=-2.54% md=12.00% > pd=13.81% | s=-2.02% md=3.84% pd=4.43% | s=-69.14% md=11.06% > pd=126.87% > 10 | s=-24.85% md=20.42% pd=35.69% | s=-4.31% md=0.73% > pd=1.53% | s=-14.97% md=0.32% pd=31.90% | s=-0.57% md=0.79% pd=0.50% > 100 | s=0.27% md=4.69% pd=1.55% | s=4.16% md=0.29% pd=0.18% > | s=11.76% md=0.85% pd=0.49% | s=15.76% md=1.64% pd=2.32% > 500 | s=0.54% md=1.88% pd=1.81% | s=9.36% md=1.17% pd=0.87% > | s=21.45% md=0.74% pd=0.88% | s=30.47% md=0.17% pd=1.17% > 1000 | s=3.22% md=1.36% pd=0.99% | s=14.74% md=0.86% > pd=0.44% | s=24.50% md=0.36% pd=0.31% | s=27.97% md=0.27% pd=0.25% > > With PWJ enabled > num_parts | 2 | 3 > | 4 | 5 > -----------+-----------------------------+----------------------------+----------------------------+----------------------------- > 0 | s=11.07% md=19.28% pd=8.70% | s=-1.18% md=5.88% pd=4.31% > | s=-2.25% md=8.42% pd=3.77% | s=25.07% md=11.48% pd=3.87% > 10 | s=-9.07% md=2.65% pd=14.58% | s=0.55% md=3.10% pd=3.41% > | s=3.89% md=3.94% pd=3.79% | s=7.25% md=2.87% pd=3.03% > 100 | s=-4.53% md=0.49% pd=8.53% | s=2.24% md=4.24% pd=3.96% > | s=6.70% md=1.30% pd=2.08% | s=9.09% md=1.39% pd=1.50% > 500 | s=-1.65% md=1.59% pd=1.44% | s=6.31% md=0.89% pd=1.11% > | s=12.72% md=0.20% pd=0.29% | s=15.02% md=0.28% pd=0.83% > 1000 | s=1.53% md=1.01% pd=1.66% | s=11.80% md=0.66% pd=0.71% > | s=16.23% md=0.58% pd=0.18% | s=17.16% md=0.67% pd=0.68% > > There are a few things to notice > 1. There is not much difference in the planning time improvement by > both the patchsets. But the patchset attached to [1] improves memory > consumption as well. So it looks more attractive. > 2. The performance regressions usually coincide with higher standard > deviation. This indicates that both the performance gains or losses > seen at lower numbers of partitions and joins are not real and > possibly ignored. I have run the script multiple times but some or > other combination of lower number of partitions and lower number of > joins shows higher deviation and thus unstable results. I have not be > able to find a way where all the combinations show a stable result.
Thanks for the patch and the extensive benchmarking. Would you please also share a simple, self-contained example that I can use to reproduce and verify the performance improvements? It’s helpful to first see the patch in action with a representative query, and then refer to the more extensive benchmark results you've already shared as needed. I'm also having a hard time telling from the scripts which query was used to produce the numbers in the report. Btw, in the commit message, you mention: === When there are thousands of partitions in a partitioned table, there can be thousands of derived clauses in the list making it inefficient for a lookup. === I haven’t found a description of how that many clauses end up in the ec->ec_derived list. IIUC, it's create_join_clause() where the child clauses get added, and it would be helpful to mention that, since that also appears to be the hotspot your patch is addressing. > I think the first patch in the attached set is worth committing, just > to tighten things up. I agree and am happy to commit it if there are no objections. -- Thanks, Amit Langote