Re: A failure in t/038_save_logical_slots_shutdown.pl

vignesh C Thu, 11 Jan 2024 03:05:43 -0800

On Wed, 10 Jan 2024 at 18:37, vignesh C <vignes...@gmail.com> wrote:
>
> On Wed, 10 Jan 2024 at 14:08, Bharath Rupireddy
> <bharath.rupireddyforpostg...@gmail.com> wrote:
> >
> > Hi,
> >
> > I've been observing a failure in t/038_save_logical_slots_shutdown.pl
> > of late on my developer system:
> >
> > t/038_save_logical_slots_shutdown.pl .. 1/?
> > #   Failed test 'Check that the slot's confirmed_flush LSN is the same
> > as the latest_checkpoint location'
> > #   at t/038_save_logical_slots_shutdown.pl line 35.
> > # Looks like you failed 1 test of 2.
> > t/038_save_logical_slots_shutdown.pl .. Dubious, test returned 1
> > (wstat 256, 0x100)
> > Failed 1/2 subtests
> >
> > I did a quick analysis of the failure and commit
> > https://github.com/postgres/postgres/commit/e0b2eed047df9045664da6f724cb42c10f8b12f0
> > that introduced this test. I think the issue is that the slot's
> > confirmed_flush LSN (0/1508000) and shutdown checkpoint LSN
> > (0/1508018) are not the same:
> >
> > tmp_check/log/038_save_logical_slots_shutdown_pub.log:
> >
> > 2024-01-10 07:55:44.539 UTC [57621] sub LOG:  starting logical
> > decoding for slot "sub"
> > 2024-01-10 07:55:44.539 UTC [57621] sub DETAIL:  Streaming
> > transactions committing after 0/1508000, reading WAL from 0/1507FC8.
> > 2024-01-10 07:55:44.539 UTC [57621] sub STATEMENT:  START_REPLICATION
> > SLOT "sub" LOGICAL 0/0 (proto_version '4', origin 'any',
> > publication_names '"pub"')
> >
> > ubuntu:~/postgres$ pg17/bin/pg_controldata -D
> > src/test/recovery/tmp_check/t_038_save_logical_slots_shutdown_pub_data/pgdata/
> > Database cluster state:               in production
> > pg_control last modified:             Wed Jan 10 07:55:44 2024
> > Latest checkpoint location:           0/1508018
> > Latest checkpoint's REDO location:    0/1508018
> >
> > But the tests added by t/038_save_logical_slots_shutdown.pl expects
> > both LSNs to be same:
> >
> > sub compare_confirmed_flush
> > {
> >     # Is it same as the value read from log?
> >     ok( $latest_checkpoint eq $confirmed_flush_from_log,
> >         "Check that the slot's confirmed_flush LSN is the same as the
> > latest_checkpoint location"
> >     );
> >
> > I suspect that it's quite not right to expect the slot's
> > confirmed_flush and latest checkpoint location to be same in the test.
> > This is because the shutdown checkpoint gets an LSN that's greater
> > than the slot's confirmed_flush LSN - see the shutdown checkpoint
> > record getting inserted into WAL after the slot is marked dirty in
> > CheckPointReplicationSlots().
> >
> > With this analysis in mind, I think the tests need to do something
> > like the following:
> >
> > diff --git a/src/test/recovery/t/038_save_logical_slots_shutdown.pl
> > b/src/test/recovery/t/038_save_logical_slots_shut
> > down.pl
> > index 5a4f5dc1d4..d49e6014fc 100644
> > --- a/src/test/recovery/t/038_save_logical_slots_shutdown.pl
> > +++ b/src/test/recovery/t/038_save_logical_slots_shutdown.pl
> > @@ -32,7 +32,7 @@ sub compare_confirmed_flush
> >           unless defined($latest_checkpoint);
> >
> >         # Is it same as the value read from log?
> > -       ok( $latest_checkpoint eq $confirmed_flush_from_log,
> > +       ok( $latest_checkpoint ge $confirmed_flush_from_log,
> >                 "Check that the slot's confirmed_flush LSN is the same
> > as the latest_checkpoint location"
> >         );
> >
> > Thoughts?
>
> I got the log files from Bharath offline. Thanks Bharath for sharing
> the log files offline.
> The WAL record sequence is exactly the same in the failing test and
> tests which are passing.
> One observation in our case the confirmed flush lsn points exactly to
> shutdown checkpoint, but in the failing test the lsn pointed is
> invalid, pg_waldump says that address is invalid and skips about 24
> bytes and then sees a valid record
>
> Passing case confirm flush lsn(0/150D158) from my machine:
> pg_waldump 000000010000000000000001 -s 0/150D158
> rmgr: XLOG        len (rec/tot):    114/   114, tx:          0, lsn:
> 0/0150D158, prev 0/0150D120, desc: CHECKPOINT_SHUTDOWN redo 0/150D158;
> tli 1; prev tli 1; fpw true; xid 0:739; oid 16388; multi 1; offset 0;
> oldest xid 728 in DB 1; oldest multi 1 in DB 1; oldest/newest commit
> timestamp xid: 0/0; oldest running xid 0; shutdown
>
> Failing case confirm flush lsn( 0/1508000) from failing tests log file:
> pg_waldump 000000010000000000000001 -s 0/1508000
> pg_waldump: first record is after 0/1508000, at 0/1508018, skipping
> over 24 bytes
> rmgr: XLOG        len (rec/tot):    114/   114, tx:          0, lsn:
> 0/01508018, prev 0/01507FC8, desc: CHECKPOINT_SHUTDOWN redo 0/1508018;
> tli 1; prev tli 1; fpw true; xid 0:739; oid 16388; multi 1; offset 0;
> oldest xid 728 in DB 1; oldest multi 1 in DB 1; oldest/newest commit
> timestamp xid: 0/0; oldest running xid 0; shutdown
>
> I'm still not sure why in this case, it is not exactly pointing to a
> valid WAL record, it has to skip 24 bytes to find the valid checkpoint
> shutdown record.
> I will investigate this further and share the analysis.


On further analysis, it was found that in the failing test,
CHECKPOINT_SHUTDOWN was started in a new page, so there was the WAL
page header present just before the CHECKPOINT_SHUTDOWN which was
causing the failure. We could alternatively reproduce the issue by
switching the WAL file before restarting the server like in the
attached test change patch.
There are a couple of ways to fix this issue a) one by switching the
WAL before the insertion of records so that the CHECKPOINT_SHUTDOWN
does not get inserted in a new page as in the attached test_fix.patch
b) by using pg_walinspect to check that the next WAL record is
CHECKPOINT_SHUTDOWN. I have to try this approach.

Thanks to Bharath and Kuroda-san for offline discussions and helping
in getting to the root cause.

Thoughts?

Regards,
Vignesh

diff --git a/src/test/recovery/t/038_save_logical_slots_shutdown.pl b/src/test/recovery/t/038_save_logical_slots_shutdown.pl
index 5a4f5dc1d4..1fddab78b1 100644
--- a/src/test/recovery/t/038_save_logical_slots_shutdown.pl
+++ b/src/test/recovery/t/038_save_logical_slots_shutdown.pl
@@ -81,6 +81,8 @@ is($result, qq(5), "check initial copy was done");
 
 my $offset = -s $node_publisher->logfile;
 
+$node_publisher->safe_psql('postgres', "SELECT pg_switch_wal()");
+
 # Restart the publisher to ensure that the slot will be flushed if required
 $node_publisher->restart();

diff --git a/src/test/recovery/t/038_save_logical_slots_shutdown.pl b/src/test/recovery/t/038_save_logical_slots_shutdown.pl
index 5a4f5dc1d4..bcef53187e 100644
--- a/src/test/recovery/t/038_save_logical_slots_shutdown.pl
+++ b/src/test/recovery/t/038_save_logical_slots_shutdown.pl
@@ -60,6 +60,14 @@ $node_subscriber->start;
 $node_publisher->safe_psql('postgres', "CREATE TABLE test_tbl (id int)");
 $node_subscriber->safe_psql('postgres', "CREATE TABLE test_tbl (id int)");
 
+# Switch the WAL file so that the SHUTDOWN_CHECKPOINT WAL record will not be
+# inserted as the first record of a new WAL page. If the SHUTDOWN_CHECKPOINT WAL
+# record is inserted as the first record of new page, it will also include the
+# WAL page header which will result in test failure as the confirm_flush_lsn
+# will be pointing to WAL page header and the SHUTDOWN_CHECKPOINT WAL record
+# will point to a WAL record after the WAL page header.
+$node_publisher->safe_psql('postgres', "SELECT pg_switch_wal()");
+
 # Insert some data
 $node_publisher->safe_psql('postgres',
 	"INSERT INTO test_tbl VALUES (generate_series(1, 5));");

Re: A failure in t/038_save_logical_slots_shutdown.pl

Reply via email to