On Mon, 19 Aug 2024 at 22:01, David Rowley <dgrowle...@gmail.com> wrote: > To try and move this forward again, I adjusted the patch to use a > static function with pg_noinline rather than unlikely. I don't think > this will make much difference code generation wise, but I did think > it was an improvement in code cleanliness. Patches attached. > > I did a round of benchmarking on an AMD Zen4 7945hx and on an Apple > M2. I also graphed the results you sent so they're easier to compare > with mine. > > 0001 is effectively the unlikely() patch for calculating the frame offsets. > 0002 is the tuplestore_reset() patch
I was experimenting with this again. The 0002 patch added a next_partition field to the WindowAggState struct and caused the struct to become slightly bigger. I've now included a 0003 patch which shifts some fields around in that struct so as to keep it the same size as it is on master. Benchmarking with that removes that very tiny performance regression. Please see the attached CSV file for the results. The percentage row compares master to all patches. I also tested this on an AMD 3990x machine along with fresh results from the AMD 7945hx laptop. Both of those machines come out faster on all tests when comparing master to all 3 patches. With the Apple M2, there does not seem to be much change in performance with the tests containing fewer rows per partition, some are faster, some are slower, all within typical noise fluctuations. Given the performance now seems improved in all cases, I plan on pushing this change as a single commit. David
v4-0001-Speedup-WindowAgg-code-by-moving-uncommon-code-ou.patch
Description: Binary data
v4-0002-Optimize-WindowAgg-s-use-of-tuplestores.patch
Description: Binary data
v4-0003-Experiment-with-WindowAggState-fields.patch
Description: Binary data
AMD 7045HX,,,,,,, version,1000000,100000,10000,1000,100,10,1 master,300.7,201.4,182.2,180.1,179.7,180.8,185.9 v4-0001,295.6,189.4,176.4,172.2,172.2,173.4,180.9 v4-0001+0002,222.3,186.6,185.3,177.9,177.4,177.8,183.9 v4-0002,224.5,192.5,183.7,186.3,180.7,182.8,188 v4-0001+0002+0003,217.6,181.4,177.2,173.9,173.6,174.5,184.2 ,138.20%,111.03%,102.80%,103.55%,103.50%,103.59%,100.96% Apple M2,,,,,,, version,1000000,100000,10000,1000,100,10,1 master,269.2,169.7,152.6,147.5,147.2,147.7,148.9 v4-0001,269.1,170.1,154.5,149,147.7,148.9,150.7 v4-0001+0002,193.1,164.6,154.3,149,148.2,149,149.9 v4-0002,192,165,157.1,151.6,150.9,150.9,152.8 v4-0001+0002+0003,187.7,162.1,153.2,148.2,147,147.5,150.4 ,143.40%,104.68%,99.59%,99.48%,100.14%,100.13%,98.95% ,,,,,,, AMD 3990x,,,,,,, version,1000000,100000,10000,1000,100,10,1 master,570.8,354.1,327.4,320,320.7,322.2,343.7 v4-0001,561.6,353.4,327.4,320.6,321.1,323.3,343 v4-0001+0002,401.2,339.1,327.6,322.8,323,324.1,344.5 v4-0002,409.1,341.3,330.5,324.5,325.1,327.7,351 v4-0001+0002+0003,403.3,336,324.1,319.7,320.6,320.6,342.5 ,141.55%,105.39%,101.02%,100.09%,100.02%,100.49%,100.37%