On 27/03/18 21:02, Tom Lane wrote:
Robert Haas <robertmh...@gmail.com> writes:
On Tue, Mar 27, 2018 at 11:41 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:
This is ignoring the possibility of damaged data in between, ie
A ... B ... CHKPT ... C ...  a few zeroed pages ... D ... CHKPT ... E ... F

It's hard for me to believe that this case matters very much.  If
you're trying to run pg_rewind on a system where the WAL segments
contain a few zeroed pages, you're probably going to be hosed anyway,
if not by this particular thing then by something else.

Well, the point of checkpoints is that WAL data before the last one
should no longer matter anymore, isn't it?

But really this is just one illustration of the point, which is that
changing the WAL header definition as proposed *does* cost us reliability.
We can argue about whether better concurrency in WAL insertion is
worth that price, but claiming that the price is zero is flat wrong.

For me, the more important issue is that checking xl_prev not only
validates that the record is where it is supposed to be, but that you
arrived at it correctly.  Replacing it with xl_curr would keep the
guarantee of the former (in fact likely make it stronger); but we'd
completely lose the latter.

Yeah, removing xl_prev would certainly remove a useful cross-check from the format. Maybe it would be worth it, I don't think it's that critical. And adding some extra information in the page or segment headers might alleviate that.

But let's go back to why we're considering this. The idea was to optimize this block:

        /*
         * The duration the spinlock needs to be held is minimized by minimizing
         * the calculations that have to be done while holding the lock. The
         * current tip of reserved WAL is kept in CurrBytePos, as a byte 
position
         * that only counts "usable" bytes in WAL, that is, it excludes all WAL
         * page headers. The mapping between "usable" byte positions and 
physical
         * positions (XLogRecPtrs) can be done outside the locked region, and
         * because the usable byte position doesn't include any headers, 
reserving
         * X bytes from WAL is almost as simple as "CurrBytePos += X".
         */
        SpinLockAcquire(&Insert->insertpos_lck);

        startbytepos = Insert->CurrBytePos;
        endbytepos = startbytepos + size;
        prevbytepos = Insert->PrevBytePos;
        Insert->CurrBytePos = endbytepos;
        Insert->PrevBytePos = startbytepos;

        SpinLockRelease(&Insert->insertpos_lck);

One trick that we could do is to replace that with a 128-bit atomic compare-and-swap instruction. Modern 64-bit Intel systems have that, it's called CMPXCHG16B. Don't know about other architectures. An atomic fetch-and-add, as envisioned in the comment above, would presumably be better, but I suspect that a compare-and-swap would be good enough to move the bottleneck elsewhere again.

- Heikki

Reply via email to