On Mon, Mar 18, 2024 at 11:38 AM Michael Paquier <mich...@paquier.xyz> wrote: > > On Sun, Mar 17, 2024 at 11:37:58AM +0530, Bharath Rupireddy wrote: > > Rebase needed after 071e3ad59d6fd2d6d1277b2bd9579397d10ded28 due to a > > conflict in meson.build. Please see the attached v23 patch. > > I've been reading this patch, and this is a very tricky one. Please > be *very* cautious.
Thanks for looking into this. > +#streaming_replication_retry_interval = 0 # time after which standby > + # attempts to switch WAL source from archive to > + # streaming replication in seconds; 0 disables > > This stuff allows a minimal retry interval of 1s. Could it be useful > to have more responsiveness here and allow lower values than that? > Why not switching the units to be milliseconds? Nathan had a different view on this to have it on the order of seconds - https://www.postgresql.org/message-id/20240305020452.GA3373526%40nathanxps13. If set to a too low value, the frequency of standby trying to connect to primary increases. IMO, the order of seconds seems fine. > + if (streaming_replication_retry_interval <= 0 || > + !StandbyMode || > + currentSource != XLOG_FROM_ARCHIVE) > + return SWITCH_TO_STREAMING_NONE; > > Hmm. Perhaps this should mention why we don't care about the > consistent point. Are you asking why we don't care whether the standby reached a consistent point when switching to streaming mode due to this new parameter? If this is the ask, the same applies when a standby typically switches to streaming replication (get WAL from primary) today, that is when receive from WAL archive finishes (no more WAL left there) or fails for any reason. The standby doesn't care about the consistent point even today, it just trusts the WAL source and makes the switch. > + /* See if we can switch WAL source to streaming */ > + if (wal_source_switch_state == SWITCH_TO_STREAMING_NONE) > + wal_source_switch_state = SwitchWALSourceToPrimary(); > > Rather than a routine that returns as result the value to use for the > GUC, I'd suggest to let this routine set the GUC as there is only one > caller of SwitchWALSourceToPrimary(). This can also include a check > on SWITCH_TO_STREAMING_NONE, based on what I'm reading that. Firstly, wal_source_switch_state is not a GUC, it's a static variable to be used across WaitForWALToBecomeAvailable calls. And, if you are suggesting to turn SwitchWALSourceToPrimary so that it sets wal_source_switch_state directly, I'd not do that because when wal_source_switch_state is not SWITCH_TO_STREAMING_NONE, the function gets called unnecessarily. > - if (lastSourceFailed) > + if (lastSourceFailed || > + wal_source_switch_state == SWITCH_TO_STREAMING) > > Hmm. This one may be tricky. I'd recommend a separation between the > failure in reading from a source and the switch to a new "forced" > source. Separation would just add duplicate code. Moreover, the code wrapped within if (lastSourceFailed) doesn't do any error handling or such, it just resets a few stuff from the previous source and sets the next source. FWIW, please check [1] (and the discussion thereon) for how the lastSourceFailed flag is being used to consume all the streamed WAL in pg_wal directly upon detecting promotion trigger file. Therefore, I see no problem with the way it is right now for this new feature. [1] /* * Data not here yet. Check for trigger, then wait for * walreceiver to wake us up when new WAL arrives. */ if (CheckForStandbyTrigger()) { /* * Note that we don't return XLREAD_FAIL immediately * here. After being triggered, we still want to * replay all the WAL that was already streamed. It's * in pg_wal now, so we just treat this as a failure, * and the state machine will move on to replay the * streamed WAL from pg_wal, and then recheck the * trigger and exit replay. */ lastSourceFailed = true; > + if (wal_source_switch_state == SWITCH_TO_STREAMING_PENDING) > + readFrom = XLOG_FROM_PG_WAL; > + else > + readFrom = currentSource == XLOG_FROM_ARCHIVE ? > + XLOG_FROM_ANY : currentSource; > > WALSourceSwitchState looks confusing here, and are you sure that this > is actualy correct? Shouldn't we still try a READ_FROM_ANY or a read > from the archives even with a streaming pending. Please see the discussion starting from https://www.postgresql.org/message-id/20221008215221.GA894639%40nathanxps13. We wanted to keep the existing behaviour the same when we intentionally switch source to streaming from archive due to the timeout. The existing behaviour is to exhaust WAL in pg_wal before switching the source to streaming after failure to fetch from archive. When wal_source_switch_state is SWITCH_TO_STREAMING_PENDING, the currentSource is already XLOG_FROM_ARCHIVE (To clear the dust off here, I've added an assert now in the attached new v24 patch). And, we don't want to pass XLOG_FROM_ANY to XLogFileReadAnyTLI to again fetch from the archive. Hence, we choose readFrom = XLOG_FROM_PG_WAL to specifically tell XLogFileReadAnyTLI read from pg_wal directly. > By the way, I am not convinced that what you have is the best > interface ever. This assumes that we'd always want to switch to > streaming more aggressively. Could there be a point in also > controlling if we should switch to pg_wal/ or just to archiving more > aggressively as well, aka be able to do the opposite switch of WAL > source? This design looks somewhat limited to me. The origin of the > issue is that we don't have a way to control the order of the sources > consumed by WAL replay. Perhaps something like a replay_source_order > that uses a list would be better, with elements settable to archive, > pg_wal and streaming? Intention of this feature is to provide a way for the streaming standby to quickly detect when the primary is up and running without having to wait until either all the WAL in the archive is over or a failure to fetch from archive happens. Advantages of this feature are: 1) it can make the recovery a bit faster (if fetching from archive adds up costs with different storage types, IO costs and network delays), thus can reduce the replication lag 2) primary (if using replication slot based streaming replication setup) doesn't have to keep the required WAL for the standby for longer durations, thus reducing the risk of no space left on disk issues. IMHO, it makes sense to have something like replay_source_order if there's any use case that arises in future requiring the standby to intentionally switch to pg_wal or archive. But not as part of this feature. Please see the attached v24 patch. I've added an assertion that the current source is archive before calling XLogFileReadAnyTLI if wal_source_switch_state is SWITCH_TO_STREAMING_PENDING. I've also added the new enum WALSourceSwitchState to typedefs.list to make pgindent adjust it correctly. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
v24-0001-Allow-standby-to-switch-WAL-source-from-archive-.patch
Description: Binary data