YA race condition in 001_stream_rep.pl (was Re: pgsql: Allow using the updated tuple while moving it to a different par)

Tom Lane Sat, 14 Jul 2018 11:18:01 -0700

I wrote:
> This looks to me like it's probably a low-probability timing problem in
> the test script itself.
> ...
> We could maybe hack around that with a short "sleep" in this script, but
> a less cruddy solution would be to change PostgresNode::reload so that
> it somehow waits for the server to process the reload signal.


On further reflection, I doubt that would help.  The issue isn't whether
the standby_2 postmaster has processed the SIGHUP, but whether its
walreceiver process has sent a feedback message.  The loop in
WalReceiverMain does things in this order:

1. Check for SIGTERM, exit if seen.

2. Check for SIGHUP, process that (and possibly send a feedback message)
if seen.

3. Read and process streaming data till no more is available.

We've seen just a tiny number of failures at this point in the test
--- scraping the buildfarm database, I find these:

   sysname    |      snapshot       |     stage     |                           
           l                                       
--------------+---------------------+---------------+------------------------------------------------------------------------------
 longfin      | 2017-09-26 14:39:05 | recoveryCheck | Timed out waiting for 
slot xmins to advance at t/001_stream_rep.pl line 155.
 flaviventris | 2018-04-07 17:02:01 | recoveryCheck | Timed out waiting for 
slot xmins to advance at t/001_stream_rep.pl line 155.
 serinus      | 2018-05-18 20:14:01 | recoveryCheck | Timed out waiting for 
slot xmins to advance at t/001_stream_rep.pl line 155.
 idiacanthus  | 2018-06-23 12:46:01 | recoveryCheck | Timed out waiting for 
slot xmins to advance at t/001_stream_rep.pl line 157.
 francolin    | 2018-06-29 13:30:01 | recoveryCheck | Timed out waiting for 
slot xmins to advance at t/001_stream_rep.pl line 157.
 flaviventris | 2018-07-12 07:36:01 | recoveryCheck | Timed out waiting for 
slot xmins to advance at t/001_stream_rep.pl line 157.
(6 rows)

So it's clearly a very low-probability race condition.  My hypothesis
is that the SIGHUP arrives, but the walreceiver process is already in
the read-and-process-data part of its loop, and the timing is such that
it manages to stay there until we issue the standby2 shutdown command.
When it finally notices that it has no more data available, it iterates
the outermost loop, and then shuts down without ever having sent a
feedback message.  The "replay_check()" call in the test script doesn't
ensure any synchronization here, since it only verifies that WAL data
has been applied, not that any feedback has happened.

I can reproduce the failure pretty reliably with a hack like the one
attached, which makes it unlikely that the walreceiver will send a
feedback message instantly when it gets the SIGHUP.

So the issue boils down to this: the test script is, effectively,
assuming that it's guaranteed that the walreceiver will send a feedback
message before it shuts down; but there is no such guarantee.  Is this
a bug in the test script, or a bug in the walreceiver logic?  I can
see the value of having such a guarantee, but I also think it would be
nigh impossible to make it a bulletproof guarantee.  We could perhaps
add "XLogWalRcvSendHSFeedback(true)" somewhere closer to process exit,
but that might add more failure modes than it removes.

Or we could change the test script to wait for feedback before it
issues the shutdown, but then I think the test is a bit pointless.

Comments?

                        regards, tom lane

diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index 987bb84..d7bb1b4 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -447,7 +447,7 @@ WalReceiverMain(void)
 				{
 					got_SIGHUP = false;
 					ProcessConfigFile(PGC_SIGHUP);
-					XLogWalRcvSendHSFeedback(true);
+					XLogWalRcvSendHSFeedback(false);
 				}
 
 				/* See if we can read data immediately */
diff --git a/src/test/recovery/t/001_stream_rep.pl b/src/test/recovery/t/001_stream_rep.pl
index a0d3e8f..8f8dbc5 100644
--- a/src/test/recovery/t/001_stream_rep.pl
+++ b/src/test/recovery/t/001_stream_rep.pl
@@ -145,7 +145,7 @@ is( $node_standby_1->psql(
 $node_standby_2->append_conf('recovery.conf',
 	"primary_slot_name = $slotname_2");
 $node_standby_2->append_conf('postgresql.conf',
-	"wal_receiver_status_interval = 1");
+	"wal_receiver_status_interval = 10");
 $node_standby_2->restart;
 
 # Fetch xmin columns from slot's pg_replication_slots row, after waiting for

YA race condition in 001_stream_rep.pl (was Re: pgsql: Allow using the updated tuple while moving it to a different par)

Reply via email to