On Tue, Jun 16, 2026 at 03:01:15PM +0700, Sergey Tatarintsev wrote:
> I found that after commit 7185eddf0522b3146ed1ff6e063e8e129e77c706 we got
> little omission
> in TAP test 004_timeline_switch:
> ...
> my $node_standby_1 = PostgreSQL::Test::Cluster->new('standby_1');
> ...
> $node_primary->stop;7185eddf0522 rings a bell. > There is no guarantee that standby_1 and standby_2 was successfully > connected to primary and start > streaming before primary stopped. Indeed. I assume that adding a conditional sleep that prevents the startup process of standby1 or standby2 to connect to their primary once they have reached a consistent state, before they are able to replay the inserts of tab_int and before the primary is stopped would be enough to make the test go rogue, with one or more standbys not getting the records we want. If standby2 gets ahead of standby1, we would fail the initial poll_query_until() done after standby2 attempts to reconnect standby1, failing the test on timeout. If standby1 gets ahead of standby2, things would work; there is a wait step for standby2 to catch up with standby1. So only the first pattern is problematic, not the second. It does not seem like the buildfarm has complained on this one (failures in latest 30 days for recoveryCheck report 026 and 035), neither does the CI: https://cfbot.cputube.org/highlights/all.html -- Michael
signature.asc
Description: PGP signature
