On Mon, 19 Aug 2024 at 22:01, David Rowley <dgrowle...@gmail.com> wrote:
> To try and move this forward again, I adjusted the patch to use a
> static function with pg_noinline rather than unlikely.  I don't think
> this will make much difference code generation wise, but I did think
> it was an improvement in code cleanliness. Patches attached.
>
> I did a round of benchmarking on an AMD Zen4 7945hx and on an Apple
> M2. I also graphed the results you sent so they're easier to compare
> with mine.
>
> 0001 is effectively the unlikely() patch for calculating the frame offsets.
> 0002 is the tuplestore_reset() patch

I was experimenting with this again.  The 0002 patch added a
next_partition field to the WindowAggState struct and caused the
struct to become slightly bigger.  I've now included a 0003 patch
which shifts some fields around in that struct so as to keep it the
same size as it is on master. Benchmarking with that removes that very
tiny performance regression.  Please see the attached CSV file for the
results. The percentage row compares master to all patches. I also
tested this on an AMD 3990x machine along with fresh results from the
AMD 7945hx laptop. Both of those machines come out faster on all tests
when comparing master to all 3 patches.  With the Apple M2, there does
not seem to be much change in performance with the tests containing
fewer rows per partition, some are faster, some are slower, all within
typical noise fluctuations.

Given the performance now seems improved in all cases, I plan on
pushing this change as a single commit.

David

Attachment: v4-0001-Speedup-WindowAgg-code-by-moving-uncommon-code-ou.patch
Description: Binary data

Attachment: v4-0002-Optimize-WindowAgg-s-use-of-tuplestores.patch
Description: Binary data

Attachment: v4-0003-Experiment-with-WindowAggState-fields.patch
Description: Binary data

AMD 7045HX,,,,,,,
version,1000000,100000,10000,1000,100,10,1
master,300.7,201.4,182.2,180.1,179.7,180.8,185.9
v4-0001,295.6,189.4,176.4,172.2,172.2,173.4,180.9
v4-0001+0002,222.3,186.6,185.3,177.9,177.4,177.8,183.9
v4-0002,224.5,192.5,183.7,186.3,180.7,182.8,188
v4-0001+0002+0003,217.6,181.4,177.2,173.9,173.6,174.5,184.2
,138.20%,111.03%,102.80%,103.55%,103.50%,103.59%,100.96%
Apple M2,,,,,,,
version,1000000,100000,10000,1000,100,10,1
master,269.2,169.7,152.6,147.5,147.2,147.7,148.9
v4-0001,269.1,170.1,154.5,149,147.7,148.9,150.7
v4-0001+0002,193.1,164.6,154.3,149,148.2,149,149.9
v4-0002,192,165,157.1,151.6,150.9,150.9,152.8
v4-0001+0002+0003,187.7,162.1,153.2,148.2,147,147.5,150.4
,143.40%,104.68%,99.59%,99.48%,100.14%,100.13%,98.95%
,,,,,,,
AMD 3990x,,,,,,,
version,1000000,100000,10000,1000,100,10,1
master,570.8,354.1,327.4,320,320.7,322.2,343.7
v4-0001,561.6,353.4,327.4,320.6,321.1,323.3,343
v4-0001+0002,401.2,339.1,327.6,322.8,323,324.1,344.5
v4-0002,409.1,341.3,330.5,324.5,325.1,327.7,351
v4-0001+0002+0003,403.3,336,324.1,319.7,320.6,320.6,342.5
,141.55%,105.39%,101.02%,100.09%,100.02%,100.49%,100.37%

Reply via email to