Re: Prefetch the next tuple's memory during seqscans

David Rowley Tue, 01 Nov 2022 04:50:55 -0700

On Wed, 2 Nov 2022 at 00:09, Andy Fan <zhihui.fan1...@gmail.com> wrote:
> I just have a different platforms at hand,  Here is my test with
> Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz.
> shared_buffers has been set to big enough to hold all the data.

Many thanks for testing that. Those numbers look much better than the
ones I got from my AMD machine.

> By theory, Why does the preferch make thing better? I am asking this
> because I think we need to read the data from buffer to cache line once
> in either case (I'm obvious wrong in face of the test result.)

That's a good question. I didn't really explain that in my email.

There's quite a bit of information in [1]. My basic understanding is
that many modern CPU architectures are ok at "Sequential Prefetching"
of cache lines from main memory when the direction is forward, but I
believe that they're not very good at detecting access patterns that
are scanning memory addresses in a backwards direction.

Because of our page layout, we have the page header followed by item
pointers at the start of the page. These item pointers are fixed with
and point to the tuples, which are variable width. Tuples are written
starting at the end of the page. The page is full when the tuples
would overlap with the item pointers. See diagrams in [2].

We do our best to keep those tuples in reverse order of the item
pointer array. This means when we're performing a forward sequence
scan, we're (generally) reading tuples starting at the end of the page
and working backwards. Since the CPU is not very good at noticing
this and prefetching the preceding cacheline, we can make things go
faster (seemingly) by issuing a manual prefetch operation by way of
pg_prefetch_mem().

The key here is that accessing RAM is far slower than accessing CPU
caches. Modern CPUs can perform multiple operations in parallel and
these can be rearranged by the CPU so they're not in the same order as
the instructions are written in the programme. It's possible that
high latency operations such as accessing RAM could hold up other
operations which depend on the value of what's waiting to come in from
RAM. If the CPU is held up like this, it's called a pipeline stall
[3]. The prefetching in this case is helping to reduce the time spent
stalled waiting for memory access.

David

[1] https://en.wikipedia.org/wiki/Cache_prefetching
I might not do the explanation justice, but I believe many CPU archate
[2]
https://techcommunity.microsoft.com/t5/azure-database-for-postgresql/speeding-up-recovery-and-vacuum-in-postgres-14/ba-p/2234071
[3] https://en.wikipedia.org/wiki/Pipeline_stall

Re: Prefetch the next tuple's memory during seqscans

Reply via email to