On Sat, Apr 9, 2022 at 6:38 PM Julien Rouhaud <rjuju...@gmail.com> wrote: > > On Sat, Apr 09, 2022 at 02:38:50PM +0530, Bharath Rupireddy wrote: > > On Fri, Apr 8, 2022 at 10:22 PM SATYANARAYANA NARLAPURAM > > <satyanarlapu...@gmail.com> wrote: > > > > > >> > <bharath.rupireddyforpostg...@gmail.com> wrote: > > >> > > > > >> > > Hi, > > >> > > > > >> > > I'm thinking if there's a way in core postgres to achieve $subject. > > >> > > In > > >> > > reality, the sync/async standbys can either be closer/farther (which > > >> > > means sync/async standbys can receive WAL at different times) to > > >> > > primary, especially in cloud HA environments with primary in one > > >> > > Availability Zone(AZ)/Region and standbys in different AZs/Regions. > > >> > > $subject may not be possible on dev systems (say, for testing some HA > > >> > > features) unless we can inject a delay in WAL senders before sending > > >> > > WAL. > > > > > > Simulation will be helpful even for end customers to simulate faults in > > > the > > > production environments during availability zone/disaster recovery drills. > > > > Right. > > I'm not sure that's actually helpful. If you want to do some realistic > testing > you need to fully simulate various network incidents and only delaying > postgres > replication is never going to be close to that. You should instead rely on > tool like tc, which can do much more than what $subject could ever do, and do > that for all your HA stack. At the very least you don't want to validate that > your setup is working as excpected by just simulating a faulty postgres > replication connection but still having all your clients and HA agent not > having any network issue at all.
Agree that the external networking tools and commands can be used. IMHO, not everyone is familiar with those tools and the tools may not be portable and reliable all the time. And developers may not be able to use those tools to test some of the HA related features (which may require sync and async standbys being closer/farther to the primary) that I or some other postgres HA solution providers may develop. Having a reliable way within the core would actually help. Upon thinking further, how about we have hooks in WAL sender code (perhaps with replication slot info that it manages and some other info) and one can implement an extension of their choice (similar to auth_delay and ClientAuthentication_hook)? Regards, Bharath Rupireddy.