On Wed, May 01, 2019 at 08:53:15AM -0700, Andres Freund wrote:
Hi,

On 2019-05-01 02:28:45 +0200, Tomas Vondra wrote:
The reason for that is pretty simple - the walsenders are doing logical
decoding, and during shutdown they're waiting for WalSndCaughtUp=true.
But per XLogSendLogical() this only happens if

   if (logical_decoding_ctx->reader->EndRecPtr >= GetFlushRecPtr())
   {
       WalSndCaughtUp = true;
       ...
   }

That is, we need to get beyong GetFlushRecPtr(). But that may never
happen, because there may be incomplete (and unflushed) page in WAL
buffers, not forced out by any transaction. So if there's a WAL record
overflowing to the unflushed page, the walsender will never catch up.

Now, this situation is apparently expected, because WalSndWaitForWal()
does this:

   /*
    * If we're shutting down, trigger pending WAL to be written out,
    * otherwise we'd possibly end up waiting for WAL that never gets
    * written, because walwriter has shut down already.
    */
   if (got_STOPPING)
       XLogBackgroundFlush();

but unfortunately that does not actually do anything, because right at
the very beginning XLogBackgroundFlush() does this:

   /* back off to last completed page boundary */
   WriteRqst.Write -= WriteRqst.Write % XLOG_BLCKSZ;

That is, it intentionally ignores the incomplete page, which means the
walsender can't read the record and reach GetFlushRecPtr().

XLogBackgroundFlush() does this since (at least) 2007, so how come we
never had issues with this before?

I assume that's because of the following logic:
        /* if we have already flushed that far, consider async commit records */
        if (WriteRqst.Write <= LogwrtResult.Flush)
        {
                SpinLockAcquire(&XLogCtl->info_lck);
                WriteRqst.Write = XLogCtl->asyncXactLSN;
                SpinLockRelease(&XLogCtl->info_lck);
                flexible = false;               /* ensure it all gets written */
        }

and various pieces of the code doing XLogSetAsyncXactLSN() to force
flushing.  I wonder if the issue is that we're better at avoiding
unnecessary WAL to be written due to
6ef2eba3f57f17960b7cd4958e18aa79e357de2f


I don't think so, because (a) there are no async commits involved, and (b)
we originally ran into the issue on 9.6 and 6ef2eba3f57f1 is only in 10+.


But I don't think we're safe without the failover slots patch, because
any output plugin can do something similar - say, LogLogicalMessage() or
something like that. I'm not aware of a plugin doing such things, but I
don't think it's illegal / prohibited either. (Of course, plugins that
generate WAL won't be useful for decoding on standby in the future.)

FWIW, I'd consider such an output plugin outright broken.


Why? Is that prohibited somewhere, either explicitly or implicitly? I do
see obvious issues with generating WAL from plugin (infinite cycle and so
on), but I suppose that's more a regular coding issue than something that
would make all plugins doing that broken.

FWIW I don't see any particular need to generate WAL from output plugins,
I mentioned it mostly jsut as a convenient way to trigger the issue. I
suppose there are other ways to generate a bit of WAL during shutdown.


So what I think we should do is to tweak XLogBackgroundFlush() so that
during shutdown it skips the back-off to page boundary, and flushes even
the last piece of WAL. There are only two callers anyway, so something
like XLogBackgroundFlush(bool) would be simple enough.

I think it then just ought to be a normal XLogFlush(). I.e. something
along the lines of:

                /*
                 * If we're shutting down, trigger pending WAL to be written 
out,
                 * otherwise we'd possibly end up waiting for WAL that never 
gets
                 * written, because walwriter has shut down already.
                 */
                if (got_STOPPING && !RecoveryInProgress())
                        XLogFlush(GetXLogInsertRecPtr());

                /* Update our idea of the currently flushed position. */
                if (!RecoveryInProgress())
                        RecentFlushPtr = GetFlushRecPtr();
                else
                        RecentFlushPtr = GetXLogReplayRecPtr(NULL);


Perhaps. That would work too, I guess.


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Reply via email to