Re: [PoC] Non-volatile WAL buffer

Heikki Linnakangas Thu, 26 Nov 2020 12:59:50 -0800

On 26/11/2020 21:27, Tomas Vondra wrote:

Hi,


Here's the "simple patch" that I'm currently experimenting with. It
essentially replaces open/close/write/fsync with pmem calls
(map/unmap/memcpy/persist variants), and it's by no means committable.
But it works well enough for experiments / measurements, etc.

The numbers (5-minute pgbench runs on scale 500) look like this:

          master/btt    master/dax           ntt        simple
    -----------------------------------------------------------
      1         5469          7402          7977          6746
     16        48222         80869        107025         82343
     32        73974        158189        214718        158348
     64        85921        154540        225715        164248
     96       150602        221159        237008        217253

A chart illustrating these results is attached. The four columns are
showing unpatched master with WAL on a pmem device, in BTT or DAX modes,
"ntt" is the patch submitted to this thread, and "simple" is the patch
I've hacked together.

As expected, the BTT case performs poorly (compared to the rest).

The "master/dax" and "simple" perform about the same. There are some
differences, but those may be attributed to noise. The NTT patch does
outperform these cases by ~20-40% in some cases.

The question is why. I recall suggestions this is due to page faults
when writing data into the WAL, but I did experiment with various
settings that I think should prevent that (e.g. disabling WAL reuse
and/or disabling zeroing the segments) but that made no measurable
difference.


The page faults are only a problem when mmap() is used *without* DAX.

Takashi tried a patch earlier to mmap() WAL segments and insert WAL tothem directly. See 0002-Use-WAL-segments-as-WAL-buffers.patch athttps://www.postgresql.org/message-id/000001d5dff4%24995ed180%24cc1c7480%24%40hco.ntt.co.jp_1.Could you test that patch too, please? Using your nomenclature, thatpatch skips wal_buffers and does:


  clients -> wal segments (PMEM DAX)

He got good results with that with DAX, but otherwise it performedworse. And then we discussed why that might be, and the page faulthypothesis was brought up.

I think 0002-Use-WAL-segments-as-WAL-buffers.patch is the most promisingapproach here. But because it's slower without DAX, we need to keep thecurrent code for non-DAX systems. Unfortunately it means that we need tomaintain both implementations, selectable with a GUC or some DAXdetection magic. The question then is whether the code complexity isworth the performance gin on DAX-enabled systems.

Andres was not excited about mmapping the WAL segments because ofperformance reasons. I'm not sure how much of his critique applies if wekeep supporting both methods and only use mmap() if so configured.


- Heikki

Re: [PoC] Non-volatile WAL buffer

Reply via email to