On Mon, Nov 14, 2022 at 12:11 PM Thomas Munro <thomas.mu...@gmail.com> wrote: > On Mon, Nov 14, 2022 at 11:26 AM Nathan Bossart > <nathandboss...@gmail.com> wrote: > > On Sun, Nov 13, 2022 at 05:08:04PM -0500, Tom Lane wrote: > > > There is something very seriously wrong with this patch. > > > > > > On my machine, running "make -j10 check-world" (with compilation > > > already done) has been taking right about 2 minutes for some time. > > > Since this patch, it's taking around 2:45 --- I did a bisect run > > > to confirm that this patch is where it changed. > > > > I've been looking into this. I wrote a similar patch for logical/worker.c > > before noticing that check-world was taking much longer. The problem in > > that case seems to be that process_syncing_tables() isn't called as often. > > It wouldn't surprise me if there's also something in walreceiver.c that > > depends upon the frequent wakeups. I suspect this will require a revert. > > In the case of "meson test pg_basebackup/020_pg_receivewal" it looks > like wait_for_catchup hangs around for 10 seconds waiting for HS > feedback. I'm wondering if we need to go back to high frequency > wakeups until it's caught up, or (probably better) arrange for a > proper event for progress. Digging...
Maybe there is a better way to code this (I mean, who likes global variables?) and I need to test some more, but I suspect the attached is approximately what we missed.
diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c index 8bd2ba37dd..fed2cc6e6f 100644 --- a/src/backend/replication/walreceiver.c +++ b/src/backend/replication/walreceiver.c @@ -1080,6 +1080,9 @@ XLogWalRcvClose(XLogRecPtr recptr, TimeLineID tli) recvFile = -1; } +static XLogRecPtr writePtr = 0; +static XLogRecPtr flushPtr = 0; + /* * Send reply message to primary, indicating our current WAL locations, oldest * xmin and the current time. @@ -1096,8 +1099,6 @@ XLogWalRcvClose(XLogRecPtr recptr, TimeLineID tli) static void XLogWalRcvSendReply(bool force, bool requestReply) { - static XLogRecPtr writePtr = 0; - static XLogRecPtr flushPtr = 0; XLogRecPtr applyPtr; TimestampTz now; @@ -1334,6 +1335,9 @@ WalRcvComputeNextWakeup(WalRcvWakeupReason reason, TimestampTz now) case WALRCV_WAKEUP_REPLY: if (wal_receiver_status_interval <= 0) wakeup[reason] = PG_INT64_MAX; + else if (writePtr != LogstreamResult.Write || + flushPtr != LogstreamResult.Flush) + wakeup[reason] = now + 100000; /* frequent replies, not yet caught up */ else wakeup[reason] = now + wal_receiver_status_interval * INT64CONST(1000000); break;