Thanks for the new version. At Sun, 8 Dec 2019 04:03:01 +0300, Grigory Smolkin <g.smol...@postgrespro.ru> wrote in > On 11/21/19 1:46 PM, Peter Eisentraut wrote: > > On 2019-11-08 05:00, Grigory Smolkin wrote: > I`ve tested it and have some thoughts/concerns: > > 1. Recovery should report the exact reason why it has been forced to > stop. In case of recovering to the end of WAL, standby promotion > request received during recovery could be mistaken for reaching the > end of WAL and reported as such. To avoid this, I think that > reachedEndOfWal variable should be introduced. > > In attached patch it is added as a global variable, but maybe > something more clever may be devised. I was not sure that > reachedEndOfWal could be placed in XLogPageReadPrivate. Because we > need to access it at the higher level than ReadRecord(), and I was > under impression placing it in XLogPageReadPrivate could violate > abstraction level of XLogPageReadPrivate.
CheckForStandbyTrigger() always returns true once the trigger is pulled. We don't care whether end-of-WAL is reached if promote is already triggered. Thus, we can tell the promote case by asking CheckForStandbyTrigger() when we exited the redo main loop with recoveryTarget = RECOVERY_TARGET_LATEST. Is this works as you expect? > 2. During the testing, I`ve stumbled upon assertion failure in case of > recovering in standby mode to the the end of WAL coupled with > recovery_target_action as "promote", caused by the WAL source in state > machine not been changed after reaching the recovery target (script to > reproduce is attached): ... > TRAP: FailedAssertion("StandbyMode", File: "xlog.c", Line: 12032) ... > #2 0x0000000000a88b82 in ExceptionalCondition (conditionName=0xb24acc > #"StandbyMode", errorType=0xb208a7 "FailedAssertion", > fileName=0xb208a0 "xlog.c", lineNumber=12032) at assert.c:67 > #3 0x0000000000573417 in WaitForWALToBecomeAvailable > #(RecPtr=151003136, randAccess=true, fetching_ckpt=false, > #tliRecPtr=167757424, > return_on_eow=true) at xlog.c:12032 ... > #7 0x00000000005651f8 in ReadRecord (xlogreader=0xf08ed8, > #RecPtr=167757424, emode=22, fetching_ckpt=false) at xlog.c:4271 .. ReadRecord is called with currentSource=STREAM after StandbyMode was turned off. I suppose the fix means the "currentSource = XLOG_FROM_PG_WAL" line but I don't think it not the right way. Streaming timeout means failure when return_on_eow is true. Thus the right thing to do there is setting lastSourceFailed to true. The first half of WaitForWALToBecomeAvailable handles failure of the current source thus source transition happens only there. The second half just reports failure to the first half. > Both issues are fixed in the new patch version. > Any review and thoughts on the matters would be much appreciated. > > > > > > I think The doc needs to exiplain on the difference between default > > and latest. > Sure, I will work on it. regards. -- Kyotaro Horiguchi NTT Open Source Software Center