Race conditions in 019_replslot_limit.pl

Heikki Linnakangas Tue, 15 Feb 2022 13:29:36 -0800

While looking at recent failures in the new 028_pitr_timelines.plrecovery test, I noticed that there have been a few failures in thebuildfarm in the recoveryCheck phase even before that, in the019_replslot_limit.pl test.


For example:


https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=desmoxytes&dt=2022-02-14%2006%3A30%3A04

[07:42:23] t/018_wal_optimize.pl ................ ok 12403 ms ( 0.00usr 0.00 sys + 1.40 cusr 0.63 csys = 2.03 CPU)

# poll_query_until timed out executing this query:
# SELECT wal_status FROM pg_replication_slots WHERE slot_name = 'rep3'
# expecting this output:
# lost
# last actual query output:
# unreserved

and:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=serinus&dt=2022-02-15%2011%3A00%3A08

#   Failed test 'have walsender pid 3682154
# 3682136'
#   at t/019_replslot_limit.pl line 335.
#                   '3682154
# 3682136'
#     doesn't match '(?^:^[0-9]+$)'

The latter looks like there are two walsenders active, which confusesthe test. Not sure what's happening in the first case, but looks likesome kind of a race condition at a quick glance.


Has anyone looked into these yet?

- Heikki

Race conditions in 019_replslot_limit.pl

Reply via email to