On Fri, Oct 1, 2021 at 2:29 AM Robert Haas <robertmh...@gmail.com> wrote:
> On Thu, Sep 30, 2021 at 7:59 AM Amul Sul <sula...@gmail.com> wrote: > > To find the value of InRecovery after we clear it, patch still uses > > ControlFile's DBState, but now the check condition changed to a more > > specific one which is less confusing. > > > > In casual off-list discussion, the point was made to check > > SharedRecoveryState to find out the InRecovery value afterward, and > > check that using RecoveryInProgress(). But we can't depend on > > SharedRecoveryState because at the start it gets initialized to > > RECOVERY_STATE_CRASH irrespective of InRecovery that happens later. > > Therefore, we can't use RecoveryInProgress() which always returns > > true if SharedRecoveryState != RECOVERY_STATE_DONE. > > Uh, this change has crept into 0002, but it should be in 0004 with the > rest of the changes to remove dependencies on variables specific to > the startup process. Like I said before, we should really be trying to > separate code movement from functional changes. Also, 0002 doesn't > actually apply for me. Did you generate these patches with 'git > format-patch'? > > [rhaas pgsql]$ patch -p1 < > ~/Downloads/v36-0001-Refactor-some-end-of-recovery-code-out-of-Startu.patch > patching file src/backend/access/transam/xlog.c > Hunk #1 succeeded at 889 (offset 9 lines). > Hunk #2 succeeded at 939 (offset 12 lines). > Hunk #3 succeeded at 5734 (offset 37 lines). > Hunk #4 succeeded at 8038 (offset 70 lines). > Hunk #5 succeeded at 8248 (offset 70 lines). > [rhaas pgsql]$ patch -p1 < > ~/Downloads/v36-0002-Postpone-some-end-of-recovery-operations-relatin.patch > patching file src/backend/access/transam/xlog.c > Reversed (or previously applied) patch detected! Assume -R? [n] > Apply anyway? [n] y > Hunk #1 FAILED at 7954. > Hunk #2 succeeded at 8079 (offset 70 lines). > 1 out of 2 hunks FAILED -- saving rejects to file > src/backend/access/transam/xlog.c.rej > [rhaas pgsql]$ git reset --hard > HEAD is now at b484ddf4d2 Treat ETIMEDOUT as indicating a > non-recoverable connection failure. > [rhaas pgsql]$ patch -p1 < > ~/Downloads/v36-0002-Postpone-some-end-of-recovery-operations-relatin.patch > patching file src/backend/access/transam/xlog.c > Reversed (or previously applied) patch detected! Assume -R? [n] > Apply anyway? [n] > Skipping patch. > 2 out of 2 hunks ignored -- saving rejects to file > src/backend/access/transam/xlog.c.rej > > I tried to apply the patch on the master branch head and it's failing with conflicts. Later applied patch on below commit and it got applied cleanly: commit 7d1aa6bf1c27bf7438179db446f7d1e72ae093d0 Author: Tom Lane <t...@sss.pgh.pa.us> Date: Mon Sep 27 18:48:01 2021 -0400 Re-enable contrib/bloom's TAP tests. rushabh@rushabh:postgresql$ git apply v36-0001-Refactor-some-end-of-recovery-code-out-of-Startu.patch rushabh@rushabh:postgresql$ git apply v36-0002-Postpone-some-end-of-recovery-operations-relatin.patch rushabh@rushabh:postgresql$ git apply v36-0003-Create-XLogAcceptWrites-function-with-code-from-.patch v36-0003-Create-XLogAcceptWrites-function-with-code-from-.patch:34: space before tab in indent. /* v36-0003-Create-XLogAcceptWrites-function-with-code-from-.patch:38: space before tab in indent. */ v36-0003-Create-XLogAcceptWrites-function-with-code-from-.patch:39: space before tab in indent. Insert->fullPageWrites = lastFullPageWrites; warning: 3 lines add whitespace errors. rushabh@rushabh:postgresql$ git apply v36-0004-Remove-dependencies-on-startup-process-specifica.patch There are whitespace errors on patch 0003. > It seems to me that the approach you're pursuing here can't work, > because the long-term goal is to get to a place where, if the system > starts up read-only, XLogAcceptWrites() might not be called until > later, after StartupXLOG() has exited. But in that case the control > file state would be DB_IN_PRODUCTION. But my idea of using > RecoveryInProgress() won't work either, because we set > RECOVERY_STATE_DONE just after we set DB_IN_PRODUCTION. Put > differently, the question we want to answer is not "are we in recovery > now?" but "did we perform recovery?". After studying the code a bit, I > think a good test might be > !XLogRecPtrIsInvalid(XLogCtl->lastReplayedEndRecPtr). If InRecovery > gets set to true, then we're certain to enter the if (InRecovery) > block that contains the main redo loop. And that block unconditionally > does XLogCtl->lastReplayedEndRecPtr = XLogCtl->replayEndRecPtr. I > think that replayEndRecPtr can't be 0 because it's supposed to > represent the record we're pretending to have last replayed, as > explained by the comments. And while lastReplayedEndRecPtr will get > updated later as we replay more records, I think it will never be set > back to 0. It's only going to increase, as we replay more records. On > the other hand if InRecovery = false then we'll never change it, and > it seems that it starts out as 0. > > I was hoping to have more time today to comment on 0004, but the day > seems to have gotten away from me. One quick thought is that it looks > a bit strange to be getting EndOfLog from GetLastSegSwitchData() which > returns lastSegSwitchLSN while getting EndOfLogTLI from replayEndTLI > ... because there's also replayEndRecPtr, which seems to go with > replayEndTLI. It feels like we should use a source for the TLI that > clearly matches the source for the corresponding LSN, unless there's > some super-good reason to do otherwise. > > -- > Robert Haas > EDB: http://www.enterprisedb.com > > > -- Rushabh Lathia