On Wed, May 01, 2019 at 02:13:10PM +0200, Alexander Kukushkin wrote:
Hi Tomas,

On Wed, 1 May 2019 at 02:28, Tomas Vondra <tomas.von...@2ndquadrant.com> wrote:

I see there's an ongoing discussion about race conditions in walreceiver
blocking shutdown, so let me start a new thread about a race condition in
walsender during shutdown.

The symptoms are rather simple - 'pg_ctl -m fast shutdown' gets stuck,
waiting for walsender processes to catch-up and terminate indefinitely.

I can confirm, during the past couple of years we observed such a
problem a few times and this is really annoying.

The reason for that is pretty simple - the walsenders are doing logical
decoding, and during shutdown they're waiting for WalSndCaughtUp=true.
But per XLogSendLogical() this only happens if

    if (logical_decoding_ctx->reader->EndRecPtr >= GetFlushRecPtr())
    {
        WalSndCaughtUp = true;
        ...
    }

After a couple of days investigating and debugging I came to a
slightly different conclusion, WalSndCaughtUp is set to true in my
case.
Since I am mostly a python person, I decided to use psycopg2 for my
investigation. I took an example from
http://initd.org/psycopg/docs/advanced.html#logical-replication-quick-start
as a starting point, created a slot and started the script.
I wasn't running any transactions, therefore the DemoConsumer.__call__
was never executed and cursor.send_feedback(flush_lsn=msg.data_start)
was never called either. Basically, the only what the psycopg2
internals was doing - periodically sending keepalive messages or
replying to keepalives sent by postgres.

OK, so that seems like a separate issue, somewhat unrelated to the issue I
reported. And I'm not sure it's a walsender issue - it seems it might be a
psycopg2 issue in not reporting the flush properly, no?

Actually, the same problem will happen even in the case when the
consumer script receives some message, but not very intensively, but
it is just a bit harder to reproduce it.

If you attach to the walsender with gdb, you'll see the following picture:
(gdb) bt
#0  0x00007fd6623d296a in __libc_send (fd=8, buf=0x55cb958dca08,
len=94, flags=0) at ../sysdeps/unix/sysv/linux/send.c:28
#1  0x000055cb93aa7ce9 in secure_raw_write (port=0x55cb958d71e0,
ptr=0x55cb958dca08, len=94) at be-secure.c:318
#2  0x000055cb93aa7b87 in secure_write (port=0x55cb958d71e0,
ptr=0x55cb958dca08, len=94) at be-secure.c:265
#3  0x000055cb93ab6bf9 in internal_flush () at pqcomm.c:1433
#4  0x000055cb93ab6b89 in socket_flush () at pqcomm.c:1409
#5  0x000055cb93dac30b in send_message_to_frontend
(edata=0x55cb942b4380 <errordata>) at elog.c:3317
#6  0x000055cb93da8973 in EmitErrorReport () at elog.c:1481
#7  0x000055cb93da5abf in errfinish (dummy=0) at elog.c:481
#8  0x000055cb93da852d in elog_finish (elevel=13, fmt=0x55cb93f32de3
"sending replication keepalive") at elog.c:1376
#9  0x000055cb93bcae71 in WalSndKeepalive (requestReply=true) at
walsender.c:3358

Is it stuck in the send() call forever, or did you happen to grab
this bracktrace?


All above text probably looks like a brain dump, but I don't think
that it conflicts with Tomas's findings it rather compliments them.
I am very glad that now I know how to mitigate the problem on the
client side, but IMHO it is also very important to fix the server
behavior if it is ever possible.


I think having a report of an issue, with a way to reproduce it is a first
(and quite important) step. So thanks for doing that.

That being said, I think those are two separate issues, with different
causes and likely different fixes. I don't think fixing the xlog flush
will resolve your issue, and vice versa.

FWIW attached is a patch that I used to reliably trigger the xlog flush
issue - it simply adds LogLogicalMessage() to commit handler in the
built-in output plugin. So all you need to do is create a subscription,
start generating commit and trigger a restart. The message is 8kB, so it's
definitely long enough to overflow to the next xlog page.

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

diff --git a/src/backend/replication/pgoutput/pgoutput.c 
b/src/backend/replication/pgoutput/pgoutput.c
index bf64c8e4a4..a18864666e 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -16,6 +16,7 @@
 
 #include "replication/logical.h"
 #include "replication/logicalproto.h"
+#include "replication/message.h"
 #include "replication/origin.h"
 #include "replication/pgoutput.h"
 
@@ -247,11 +248,19 @@ static void
 pgoutput_commit_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
                                        XLogRecPtr commit_lsn)
 {
+       char    message[8192];
+
        OutputPluginUpdateProgress(ctx);
 
        OutputPluginPrepareWrite(ctx, true);
        logicalrep_write_commit(ctx->out, txn, commit_lsn);
        OutputPluginWrite(ctx, true);
+
+       /* a simple string */
+       memset(message, 'a', 8191);
+       message[8191] = '\0';
+
+       LogLogicalMessage("TEST", message, sizeof(message), false);
 }
 
 /*

Reply via email to