On 26/11/2020 21:27, Tomas Vondra wrote:
Hi,

Here's the "simple patch" that I'm currently experimenting with. It
essentially replaces open/close/write/fsync with pmem calls
(map/unmap/memcpy/persist variants), and it's by no means committable.
But it works well enough for experiments / measurements, etc.

The numbers (5-minute pgbench runs on scale 500) look like this:

          master/btt    master/dax           ntt        simple
    -----------------------------------------------------------
      1         5469          7402          7977          6746
     16        48222         80869        107025         82343
     32        73974        158189        214718        158348
     64        85921        154540        225715        164248
     96       150602        221159        237008        217253

A chart illustrating these results is attached. The four columns are
showing unpatched master with WAL on a pmem device, in BTT or DAX modes,
"ntt" is the patch submitted to this thread, and "simple" is the patch
I've hacked together.

As expected, the BTT case performs poorly (compared to the rest).

The "master/dax" and "simple" perform about the same. There are some
differences, but those may be attributed to noise. The NTT patch does
outperform these cases by ~20-40% in some cases.

The question is why. I recall suggestions this is due to page faults
when writing data into the WAL, but I did experiment with various
settings that I think should prevent that (e.g. disabling WAL reuse
and/or disabling zeroing the segments) but that made no measurable
difference.

The page faults are only a problem when mmap() is used *without* DAX.

Takashi tried a patch earlier to mmap() WAL segments and insert WAL to them directly. See 0002-Use-WAL-segments-as-WAL-buffers.patch at https://www.postgresql.org/message-id/000001d5dff4%24995ed180%24cc1c7480%24%40hco.ntt.co.jp_1. Could you test that patch too, please? Using your nomenclature, that patch skips wal_buffers and does:

  clients -> wal segments (PMEM DAX)

He got good results with that with DAX, but otherwise it performed worse. And then we discussed why that might be, and the page fault hypothesis was brought up.

I think 0002-Use-WAL-segments-as-WAL-buffers.patch is the most promising approach here. But because it's slower without DAX, we need to keep the current code for non-DAX systems. Unfortunately it means that we need to maintain both implementations, selectable with a GUC or some DAX detection magic. The question then is whether the code complexity is worth the performance gin on DAX-enabled systems.

Andres was not excited about mmapping the WAL segments because of performance reasons. I'm not sure how much of his critique applies if we keep supporting both methods and only use mmap() if so configured.

- Heikki


Reply via email to