On Fri, Jan 17, 2020 at 11:08 AM Michael Paquier <mich...@paquier.xyz> wrote: > > On Fri, Jan 17, 2020 at 09:34:05AM +0530, Asim R P wrote: > > > > 0001 - TAP test to demonstrate the problem. > > There is no real need for debug_replay_delay because we have already > recovery_min_apply_delay, no? That would count only after consistency > has been reached, and only for COMMIT records, but your test would be > enough with that. >
Indeed, we didn't know about recovery_min_apply_delay. Thank you for the suggestion, the updated test is attached. > > > This is a POC, we are looking for early feedback on whether the > > problem is worth solving and if it makes sense to solve if along this > > route. > > You are not the first person interested in this problem, we have a > patch registered in this CF to control the timing when a WAL receiver > is started at recovery: > https://commitfest.postgresql.org/26/1995/ > https://www.postgresql.org/message-id/b271715f-f945-35b0-d1f5-c9de3e56f...@postgrespro.ru > Great to know about this patch and the discussion. The test case and the part that saves next start point in control file from our patch can be combined with Konstantin's patch to solve this problem. Let me work on that. > I am pretty sure that we should not change the default behavior to > start the WAL receiver after replaying everything from the archives to > avoid copying some WAL segments for nothing, so being able to use a > GUC switch should be the way to go, and Konstantin's latest patch was > using this approach. Your patch 0002 adds visibly a third mode: start > immediately on top of the two ones already proposed: > - Start after replaying all WAL available locally and in the > archives. > - Start after reaching a consistent point. Consistent point should be reached fairly quickly, in spite of large replay lag. Min recovery point is updated during XLOG flush and that happens when a commit record is replayed. Commits should occur frequently in the WAL stream. So I do not see much value in starting WAL receiver immediately as compared to starting it after reaching a consistent point. Does that make sense? That said, is there anything obviously wrong with starting WAL receiver immediately, even before reaching consistent state? A consequence is that WAL receiver may overwrite a WAL segment while startup process is reading and replaying WAL from it. But that doesn't appear to be a problem because the overwrite should happen with identical content as before. Asim
v1-0001-Test-that-replay-of-WAL-logs-on-standby-does-not-.patch
Description: Binary data
v1-0003-Start-WAL-receiver-when-it-is-found-not-running.patch
Description: Binary data
v1-0002-Start-WAL-receiver-before-startup-process-replays.patch
Description: Binary data