Re: strange perf regression with data checksums

Tomas Vondra Mon, 19 May 2025 11:01:50 -0700

On 5/19/25 19:20, Peter Geoghegan wrote:
> On Mon, May 19, 2025 at 12:42 PM Peter Geoghegan <p...@bowt.ie> wrote:
>> We don't actually need to call BufferGetLSNAtomic() from _bt_readpage
>> during index-only scans (nor during bitmap index scans). We can just
>> not call BufferGetLSNAtomic() at all (except during plain index
>> scans), with no possible downside.
> 
> Attached quick-and-dirty prototype patch shows how this could work. It
> fully avoids calling BufferGetLSNAtomic() from _bt_readpage() during
> index-only scans. I find that "meson test" passes with the patch
> applied (I'm reasonably confident that this general approach is
> correct).
> 
> Does this patch of mine actually fix the regressions that you're
> concerned about?
> 

For index-only scans, yes. The numbers I used to see were

  64 clients: 1.2M tps
  96 clients: 0.6M tps

and with the patch it's

  64 clients: 1.2M tps
  96 clients: 2.9M tps

I'm aware it more than doubles, even if the client count grew only 1.5x.
I believe this is due to a VM config issue I saw earlier, but it wasn't
fixed on this particular VM (unfortunate, but out of my control).

The regular index scan however still have this issue, although it's not
as visible as for IOS. If I disable IOS, the throughput is

  64 clients: 0.7M tps
  96 clients: 0.6M tps

And the profile looks like this:

Overhead  Shared Object         Symbol
  28.85%  postgres              [.] PinBuffer
  22.61%  postgres              [.] UnpinBufferNoOwner
   9.67%  postgres              [.] BufferGetLSNAtomic
   4.92%  [kernel]              [k] finish_task_switch.isra.0
   3.78%  [kernel]              [k] _raw_spin_unlock_irqrestore
   ...

In an earlier message I mentioned maybe we could add an atomic variable
tracking the page LSN, so that we don't have to obtain the header lock.
I didn't have time to try yet.


regards

-- 
Tomas Vondra
Re: strange perf regression with data checksums

Reply via email to