On Sat, Aug 10, 2024 at 6:58 PM Alexander Korotkov <aekorot...@gmail.com> wrote: > On Tue, Aug 6, 2024 at 8:36 AM Michael Paquier <mich...@paquier.xyz> wrote: > > On Tue, Aug 06, 2024 at 05:17:10AM +0300, Alexander Korotkov wrote: > > > The 0001 patch is intended to improve this situation. Actually, it's > > > not right to just put RecoveryInProgress() after > > > GetXLogReplayRecPtr(), because more wal could be replayed between > > > these calls. Instead we need to recheck GetXLogReplayRecPtr() after > > > getting negative result of RecoveryInProgress() because WAL replay > > > position couldn't get updated after. > > > 0002 patch comprises fix for the header comment of WaitLSNSetLatches() > > > function > > > 0003 patch comprises tests for pg_wal_replay_wait() errors. > > > > Before adding more tests, could it be possible to stabilize what's in > > the tree? drongo has reported one failure with the recovery test > > 043_wal_replay_wait.pl introduced recently by 3c5db1d6b016: > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=drongo&dt=2024-08-05%2004%3A24%3A54 > > I'm currently running a 043_wal_replay_wait test in a loop of drongo. > No failures during more than 10 hours. As I pointed in [1] it seems > that test stuck somewhere on launching BackgroundPsql. Given that > drongo have some strange failures from time to time (for instance [2] > or [3]), I doubt there is something specifically wrong in > 043_wal_replay_wait test that caused the subject failure.
With help of Andrew Dunstan, I've run 043_wal_replay_wait.pl in a loop for two days, then the whole test suite also for two days. Haven't seen any failures. I don't see the point to run more experiments, because Andrew needs to bring drongo back online as a buildfarm member. It might happen that something exceptional happened on drongo (like inability to launch a new process or something). For now, I think the reasonable strategy would be to wait and see if something similar will repeat on buildfarm. ------ Regards, Alexander Korotkov Supabase