On Fri, Jan 23, 2026 at 4:04 PM Melanie Plageman
<[email protected]> wrote:
> Attached v3 basically does what you suggested above. Now, we should
> only have to wait if the backend encounters a buffer after another
> backend has set BM_IO_IN_PROGRESS but before that other backend has
> set the buffer descriptor's wait reference.

Have you considered making ProcessBufferHit into an inline function? I
find that doing so meaningfully improves performance with the index
prefetching patch set. This is particularly true for cached index-only
scans with many VM buffer hits. And it seems to have no downside.

Right now, without any inlining, running perf against a backend that
executes such an index-only scan shows the function/symbol
"ProcessBufferHit.isra.0" as very hot. Apparently gcc does this isra
business ("Interprocedural Scalar Replacement of Aggregates") as an
optimization. Instead of passing the whole struct or pointer, the
caller is rewritten to extract just the necessary scalar values (like
an int or a bool) and pass those directly in registers. But we seem to
be better off fully inlining the function.

--
Peter Geoghegan


Reply via email to