On Fri, Apr 8, 2022 at 10:22 PM SATYANARAYANA NARLAPURAM <satyanarlapu...@gmail.com> wrote: > >> > <bharath.rupireddyforpostg...@gmail.com> wrote: >> > > >> > > Hi, >> > > >> > > I'm thinking if there's a way in core postgres to achieve $subject. In >> > > reality, the sync/async standbys can either be closer/farther (which >> > > means sync/async standbys can receive WAL at different times) to >> > > primary, especially in cloud HA environments with primary in one >> > > Availability Zone(AZ)/Region and standbys in different AZs/Regions. >> > > $subject may not be possible on dev systems (say, for testing some HA >> > > features) unless we can inject a delay in WAL senders before sending >> > > WAL. > > Simulation will be helpful even for end customers to simulate faults in the > production environments during availability zone/disaster recovery drills.
Right. >> > > How about having two developer-only GUCs {async, >> > > sync}_wal_sender_delay? When set, the async and sync WAL senders will >> > > delay sending WAL by {async, sync}_wal_sender_delay >> > > milliseconds/seconds? Although, I can't think of any immediate use, it >> > > will be useful someday IMO, say for features like [1], if it gets in. >> > > With this set of GUCs, one can even add core regression tests for HA >> > > features. > > I would suggest doing this at the slot level, instead of two GUCs that > control the behavior of all the slots (physical/logical). Something like > "pg_suspend_replication_slot and pg_Resume_replication_slot"? Having the control at the replication slot level seems reasonable instead of at the WAL sender level. As there can be many slots on the primary, we must have a way to specify which slots need to be delayed and by how much time before sending WAL. If GUCs, they must be of list types and I'm not sure that would come out well. Instead, two (superuser-only/users with replication role) functions such as pg_replication_slot_set_delay(slot_name, delay_in_milliseconds)/pg_replication_slot_unset_delay(slot_name). pg_replication_slot_set_delay will set ReplicationSlot->delay and the WAL sender checks MyReplicationSlot->delay > 0 and waits before sending WAL. pg_replication_slot_unset_delay will set ReplicationSlot->delay to 0, or instead of pg_replication_slot_unset_delay, the pg_replication_slot_set_delay(slot_name, 0) can be used, this way only single function. If the users want a standby to receive WAL with a delay, they can use pg_replication_slot_set_delay after creating the replication slot. Thoughts? > Alternatively a GUC on the standby side instead of primary so that the wal > receiver stops responding to the wal sender? I think we have wal_receiver_status_interval GUC on WAL receiver that achieves the above i.e. not responding to the primary at all, one can set wal_receiver_status_interval to, say, 1day. [1] { {"wal_receiver_status_interval", PGC_SIGHUP, REPLICATION_STANDBY, gettext_noop("Sets the maximum interval between WAL receiver status reports to the sending server."), NULL, GUC_UNIT_S }, &wal_receiver_status_interval, 10, 0, INT_MAX / 1000, NULL, NULL, NULL }, Regards, Bharath Rupireddy.