On Tue, Sep 17, 2024 at 9:00 AM Alexander Lakhin <exclus...@gmail.com> wrote: > 16.09.2024 21:55, Alexander Korotkov wrote: > > Please find two patches attached. The first one does minor cleanup > > including misuse of words you've pointed. The second one adds missing > > wait_for_catchup(). That should fix the test failure you've spotted. > > Please, check if it fixes an issue for you. > > Thank you for looking at that! > > Unfortunately, the issue is still here — the test failed for me 6 out of > 10 runs, as below: > [05:14:02.807](0.135s) ok 8 - got error after standby promote > error running SQL: 'psql:<stdin>:1: ERROR: recovery is not in progress > HINT: Waiting for LSN can only be executed during recovery.' > while running 'psql -XAtq -d port=12734 host=/tmp/04hQ75NuXf > dbname='postgres' -f - -v ON_ERROR_STOP=1' with sql 'CALL > pg_wal_replay_wait('0/300F248');' at > .../src/test/recovery/../../../src/test/perl/PostgreSQL/Test/Cluster.pm line > 2140. > > 043_wal_replay_wait_standby.log: > 2024-09-17 05:14:02.714 UTC [1817258] 043_wal_replay_wait.pl ERROR: recovery > is not in progress > 2024-09-17 05:14:02.714 UTC [1817258] 043_wal_replay_wait.pl DETAIL: > Recovery ended before replaying target LSN > 2/570CD648; last replay LSN 0/300F210. > 2024-09-17 05:14:02.714 UTC [1817258] 043_wal_replay_wait.pl STATEMENT: CALL > pg_wal_replay_wait('2/570CD648'); > 2024-09-17 05:14:02.714 UTC [1817155] LOG: checkpoint starting: force > 2024-09-17 05:14:02.714 UTC [1817154] LOG: database system is ready to > accept connections > 2024-09-17 05:14:02.811 UTC [1817270] 043_wal_replay_wait.pl LOG: statement: > CALL pg_wal_replay_wait('0/300F248'); > 2024-09-17 05:14:02.811 UTC [1817270] 043_wal_replay_wait.pl ERROR: recovery > is not in progress > > pg_waldump -p .../t_043_wal_replay_wait_primary_data/pgdata/pg_wal/ > 000000010000000000000003 > rmgr: Transaction len (rec/tot): 34/ 34, tx: 748, lsn: > 0/0300F1E8, prev 0/0300F1A8, desc: COMMIT > 2024-09-17 05:14:01.654874 UTC > rmgr: Standby len (rec/tot): 50/ 50, tx: 0, lsn: > 0/0300F210, prev 0/0300F1E8, desc: RUNNING_XACTS > nextXid 749 latestCompletedXid 748 oldestRunningXid 749 > > I wonder, can you reproduce this with that bgwriter's modification?
Yes, now I did reproduce. I got that the problem could be that insert LSN is not yet written at primary, thus wait_for_catchup doesn't wait for it. I've workarounded that using pg_switch_wal(). The revised patchset is attached. > I've also found two more "achievements" coined by 3c5db1d6b: > doc/src/sgml/func.sgml: It may also happen that target > <acronym>lsn</acronym> is not achieved > src/backend/access/transam/xlog.c- * recovery was ended before > achieving the target LSN. Fixed this at well in 0001. ------ Regards, Alexander Korotkov Supabase
v2-0001-Minor-cleanup-related-to-pg_wal_replay_wait-proce.patch
Description: Binary data
v2-0002-Ensure-standby-promotion-point-in-043_wal_replay_.patch
Description: Binary data