On Wed, 10 Jan 2024 at 18:37, vignesh C <vignes...@gmail.com> wrote: > > On Wed, 10 Jan 2024 at 14:08, Bharath Rupireddy > <bharath.rupireddyforpostg...@gmail.com> wrote: > > > > Hi, > > > > I've been observing a failure in t/038_save_logical_slots_shutdown.pl > > of late on my developer system: > > > > t/038_save_logical_slots_shutdown.pl .. 1/? > > # Failed test 'Check that the slot's confirmed_flush LSN is the same > > as the latest_checkpoint location' > > # at t/038_save_logical_slots_shutdown.pl line 35. > > # Looks like you failed 1 test of 2. > > t/038_save_logical_slots_shutdown.pl .. Dubious, test returned 1 > > (wstat 256, 0x100) > > Failed 1/2 subtests > > > > I did a quick analysis of the failure and commit > > https://github.com/postgres/postgres/commit/e0b2eed047df9045664da6f724cb42c10f8b12f0 > > that introduced this test. I think the issue is that the slot's > > confirmed_flush LSN (0/1508000) and shutdown checkpoint LSN > > (0/1508018) are not the same: > > > > tmp_check/log/038_save_logical_slots_shutdown_pub.log: > > > > 2024-01-10 07:55:44.539 UTC [57621] sub LOG: starting logical > > decoding for slot "sub" > > 2024-01-10 07:55:44.539 UTC [57621] sub DETAIL: Streaming > > transactions committing after 0/1508000, reading WAL from 0/1507FC8. > > 2024-01-10 07:55:44.539 UTC [57621] sub STATEMENT: START_REPLICATION > > SLOT "sub" LOGICAL 0/0 (proto_version '4', origin 'any', > > publication_names '"pub"') > > > > ubuntu:~/postgres$ pg17/bin/pg_controldata -D > > src/test/recovery/tmp_check/t_038_save_logical_slots_shutdown_pub_data/pgdata/ > > Database cluster state: in production > > pg_control last modified: Wed Jan 10 07:55:44 2024 > > Latest checkpoint location: 0/1508018 > > Latest checkpoint's REDO location: 0/1508018 > > > > But the tests added by t/038_save_logical_slots_shutdown.pl expects > > both LSNs to be same: > > > > sub compare_confirmed_flush > > { > > # Is it same as the value read from log? > > ok( $latest_checkpoint eq $confirmed_flush_from_log, > > "Check that the slot's confirmed_flush LSN is the same as the > > latest_checkpoint location" > > ); > > > > I suspect that it's quite not right to expect the slot's > > confirmed_flush and latest checkpoint location to be same in the test. > > This is because the shutdown checkpoint gets an LSN that's greater > > than the slot's confirmed_flush LSN - see the shutdown checkpoint > > record getting inserted into WAL after the slot is marked dirty in > > CheckPointReplicationSlots(). > > > > With this analysis in mind, I think the tests need to do something > > like the following: > > > > diff --git a/src/test/recovery/t/038_save_logical_slots_shutdown.pl > > b/src/test/recovery/t/038_save_logical_slots_shut > > down.pl > > index 5a4f5dc1d4..d49e6014fc 100644 > > --- a/src/test/recovery/t/038_save_logical_slots_shutdown.pl > > +++ b/src/test/recovery/t/038_save_logical_slots_shutdown.pl > > @@ -32,7 +32,7 @@ sub compare_confirmed_flush > > unless defined($latest_checkpoint); > > > > # Is it same as the value read from log? > > - ok( $latest_checkpoint eq $confirmed_flush_from_log, > > + ok( $latest_checkpoint ge $confirmed_flush_from_log, > > "Check that the slot's confirmed_flush LSN is the same > > as the latest_checkpoint location" > > ); > > > > Thoughts? > > I got the log files from Bharath offline. Thanks Bharath for sharing > the log files offline. > The WAL record sequence is exactly the same in the failing test and > tests which are passing. > One observation in our case the confirmed flush lsn points exactly to > shutdown checkpoint, but in the failing test the lsn pointed is > invalid, pg_waldump says that address is invalid and skips about 24 > bytes and then sees a valid record > > Passing case confirm flush lsn(0/150D158) from my machine: > pg_waldump 000000010000000000000001 -s 0/150D158 > rmgr: XLOG len (rec/tot): 114/ 114, tx: 0, lsn: > 0/0150D158, prev 0/0150D120, desc: CHECKPOINT_SHUTDOWN redo 0/150D158; > tli 1; prev tli 1; fpw true; xid 0:739; oid 16388; multi 1; offset 0; > oldest xid 728 in DB 1; oldest multi 1 in DB 1; oldest/newest commit > timestamp xid: 0/0; oldest running xid 0; shutdown > > Failing case confirm flush lsn( 0/1508000) from failing tests log file: > pg_waldump 000000010000000000000001 -s 0/1508000 > pg_waldump: first record is after 0/1508000, at 0/1508018, skipping > over 24 bytes > rmgr: XLOG len (rec/tot): 114/ 114, tx: 0, lsn: > 0/01508018, prev 0/01507FC8, desc: CHECKPOINT_SHUTDOWN redo 0/1508018; > tli 1; prev tli 1; fpw true; xid 0:739; oid 16388; multi 1; offset 0; > oldest xid 728 in DB 1; oldest multi 1 in DB 1; oldest/newest commit > timestamp xid: 0/0; oldest running xid 0; shutdown > > I'm still not sure why in this case, it is not exactly pointing to a > valid WAL record, it has to skip 24 bytes to find the valid checkpoint > shutdown record. > I will investigate this further and share the analysis.
On further analysis, it was found that in the failing test, CHECKPOINT_SHUTDOWN was started in a new page, so there was the WAL page header present just before the CHECKPOINT_SHUTDOWN which was causing the failure. We could alternatively reproduce the issue by switching the WAL file before restarting the server like in the attached test change patch. There are a couple of ways to fix this issue a) one by switching the WAL before the insertion of records so that the CHECKPOINT_SHUTDOWN does not get inserted in a new page as in the attached test_fix.patch b) by using pg_walinspect to check that the next WAL record is CHECKPOINT_SHUTDOWN. I have to try this approach. Thanks to Bharath and Kuroda-san for offline discussions and helping in getting to the root cause. Thoughts? Regards, Vignesh
diff --git a/src/test/recovery/t/038_save_logical_slots_shutdown.pl b/src/test/recovery/t/038_save_logical_slots_shutdown.pl index 5a4f5dc1d4..1fddab78b1 100644 --- a/src/test/recovery/t/038_save_logical_slots_shutdown.pl +++ b/src/test/recovery/t/038_save_logical_slots_shutdown.pl @@ -81,6 +81,8 @@ is($result, qq(5), "check initial copy was done"); my $offset = -s $node_publisher->logfile; +$node_publisher->safe_psql('postgres', "SELECT pg_switch_wal()"); + # Restart the publisher to ensure that the slot will be flushed if required $node_publisher->restart();
diff --git a/src/test/recovery/t/038_save_logical_slots_shutdown.pl b/src/test/recovery/t/038_save_logical_slots_shutdown.pl index 5a4f5dc1d4..bcef53187e 100644 --- a/src/test/recovery/t/038_save_logical_slots_shutdown.pl +++ b/src/test/recovery/t/038_save_logical_slots_shutdown.pl @@ -60,6 +60,14 @@ $node_subscriber->start; $node_publisher->safe_psql('postgres', "CREATE TABLE test_tbl (id int)"); $node_subscriber->safe_psql('postgres', "CREATE TABLE test_tbl (id int)"); +# Switch the WAL file so that the SHUTDOWN_CHECKPOINT WAL record will not be +# inserted as the first record of a new WAL page. If the SHUTDOWN_CHECKPOINT WAL +# record is inserted as the first record of new page, it will also include the +# WAL page header which will result in test failure as the confirm_flush_lsn +# will be pointing to WAL page header and the SHUTDOWN_CHECKPOINT WAL record +# will point to a WAL record after the WAL page header. +$node_publisher->safe_psql('postgres', "SELECT pg_switch_wal()"); + # Insert some data $node_publisher->safe_psql('postgres', "INSERT INTO test_tbl VALUES (generate_series(1, 5));");