Re: Prefetch the next tuple's memory during seqscans

David Rowley Mon, 03 Apr 2023 21:50:40 -0700

On Tue, 4 Apr 2023 at 07:47, Gregory Stark (as CFM) <stark....@gmail.com> wrote:
> The referenced patch was committed March 19th but there's been no
> comment here. Is this patch likely to go ahead this release or should
> I move it forward again?


Thanks for the reminder on this.

I have done some work on it but just didn't post it here as I didn't
have good news.  The problem I'm facing is that after Melanie's recent
refactor work done around heapgettup() [1], I can no longer get the
same speedup as before with the pg_prefetch_mem(). While testing
Melanie's patches, I did do some performance tests and did see a good
increase in performance from it. I really don't know the reason why
the prefetching does not show the gains as it did before. Perhaps the
rearranged code is better able to perform hardware prefetching of
cache lines.

I am, however, inclined not to drop the pg_prefetch_mem() macro
altogether just because I can no longer demonstrate any performance
gains during sequential scans, so I decided to go and try what Thomas
mentioned in [2] to use the prefetching macro to fetch the required
tuples in PageRepairFragmentation() so that they're cached in CPU
cache by the time we get to compactify_tuples().

I tried this using the same test as I described in [3] after adjusting
the following line to use PANIC instead of LOG:

ereport(LOG,
    (errmsg("redo done at %X/%X system usage: %s",
    LSN_FORMAT_ARGS(xlogreader->ReadRecPtr),
    pg_rusage_show(&ru0))));

doing that allows me to repeat the test using the same WAL each time.

amd3990x CPU on Ubuntu 22.10 with 64GB RAM.

shared_buffers = 10GB
checkpoint_timeout = '1 h'
max_wal_size = 100GB
max_connections = 300

Master:

2023-04-04 15:54:55.635 NZST [15958] PANIC:  redo done at 0/DC447610
system usage: CPU: user: 44.46 s, system: 0.97 s, elapsed: 45.45 s
2023-04-04 15:56:33.380 NZST [16109] PANIC:  redo done at 0/DC447610
system usage: CPU: user: 43.80 s, system: 0.86 s, elapsed: 44.69 s
2023-04-04 15:57:25.968 NZST [16134] PANIC:  redo done at 0/DC447610
system usage: CPU: user: 44.08 s, system: 0.74 s, elapsed: 44.84 s
2023-04-04 15:58:53.820 NZST [16158] PANIC:  redo done at 0/DC447610
system usage: CPU: user: 44.20 s, system: 0.72 s, elapsed: 44.94 s

Prefetch Memory in PageRepairFragmentation():

2023-04-04 16:03:16.296 NZST [25921] PANIC:  redo done at 0/DC447610
system usage: CPU: user: 41.73 s, system: 0.77 s, elapsed: 42.52 s
2023-04-04 16:04:07.384 NZST [25945] PANIC:  redo done at 0/DC447610
system usage: CPU: user: 40.87 s, system: 0.86 s, elapsed: 41.74 s
2023-04-04 16:05:01.090 NZST [25968] PANIC:  redo done at 0/DC447610
system usage: CPU: user: 41.20 s, system: 0.72 s, elapsed: 41.94 s
2023-04-04 16:05:49.235 NZST [25996] PANIC:  redo done at 0/DC447610
system usage: CPU: user: 41.56 s, system: 0.66 s, elapsed: 42.24 s

About 6.7% performance increase over master.

I wonder since I really just did the seqscan patch as a means to get
the pg_prefetch_mem() patch in, I wonder if it's ok to scrap that in
favour of the PageRepairFragmentation patch.

Updated patches attached.

David

[1] 
https://postgr.es/m/CAAKRu_YSOnhKsDyFcqJsKtBSrd32DP-jjXmv7hL0BPD-z0TGXQ%40mail.gmail.com
[2] 
https://postgr.es/m/CA%2BhUKGJRtzbbhVmb83vbCiMRZ4piOAi7HWLCqs%3DGQ74mUPrP_w%40mail.gmail.com
[3] 
https://postgr.es/m/CAApHDvoKwqAzhiuxEt8jSquPJKDpH8DNUZDFUSX9P7DXrJdc3Q%40mail.gmail.com

v1-0001-Add-pg_prefetch_mem-macro-to-load-cache-lines.patch
Description: Binary data

prefetch_in_PageRepairFragmentation.patch
Description: Binary data

Re: Prefetch the next tuple's memory during seqscans

Reply via email to