On Fri, Sep 9, 2022 at 2:00 PM Kyotaro Horiguchi <horikyota....@gmail.com> wrote: > > Hello. > > While I played with some patch, I met an assertion failure. > > #2 0x0000000000b350e0 in ExceptionalCondition ( > conditionName=0xbd8970 "!IsInstallXLogFileSegmentActive()", > errorType=0xbd6e11 "FailedAssertion", fileName=0xbd6f28 "xlogrecovery.c", > lineNumber=4190) at assert.c:69 > #3 0x0000000000586f9c in XLogFileRead (segno=61, emode=13, tli=1, > source=XLOG_FROM_ARCHIVE, notfoundOk=true) at xlogrecovery.c:4190 > #4 0x00000000005871d2 in XLogFileReadAnyTLI (segno=61, emode=13, > source=XLOG_FROM_ANY) at xlogrecovery.c:4296 > #5 0x000000000058656f in WaitForWALToBecomeAvailable (RecPtr=1023410360, > randAccess=false, fetching_ckpt=false, tliRecPtr=1023410336, replayTLI=1, > replayLSN=1023410336, nonblocking=false) at xlogrecovery.c:3727 > > This is replayable by the following steps. > > 1. insert a sleep(1) in WaitForWALToBecomeAvailable(). > > * WAL that we restore from archive. > > */ > > + sleep(1); > > if (WalRcvStreaming()) > > XLogShutdownWalRcv(); > > 2. create a primary with archiving enabled. > > 3. create a standby with recovering from the primary's archive and > unconnectable primary_conninfo. > > 4. start the primary. > > 5. switch wal on the primary. > > 6. Kaboom. > > This is because WaitForWALToBecomeAvailable doesn't call > XLogSHutdownWalRcv() when walreceiver has been stopped before we reach > the WalRcvStreaming() call cited above. But we need to set > InstasllXLogFileSegmentActive to false even in that case, since no one > other than startup process does that. > > Unconditionally calling XLogShutdownWalRcv() fixes it. I feel we might > need to correct the dependencies between the flag and walreceiver > state, but it not mandatory because XLogShutdownWalRcv() is designed > so that it can be called even after walreceiver is stopped. I don't > have a clear memory about why we do that at the time, though, but > recovery check runs successfully with this. > > This code was introduced at PG12.
I think it is a duplicate of [1]. I have tested the above use-case with the patch at [1] and it fixes the issue. [1] https://www.postgresql.org/message-id/CALj2ACXPn_xePphnh88qmoQqqW%2BE2KEOdxGL%2BD-o9o7_XNGkkw%40mail.gmail.com -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com