Re: A failure in t/038_save_logical_slots_shutdown.pl

Amit Kapila Wed, 10 Jan 2024 21:52:55 -0800

On Wed, Jan 10, 2024 at 2:08 PM Bharath Rupireddy
<bharath.rupireddyforpostg...@gmail.com> wrote:
>
> I've been observing a failure in t/038_save_logical_slots_shutdown.pl
> of late on my developer system:
>
> t/038_save_logical_slots_shutdown.pl .. 1/?
> #   Failed test 'Check that the slot's confirmed_flush LSN is the same
> as the latest_checkpoint location'
> #   at t/038_save_logical_slots_shutdown.pl line 35.
> # Looks like you failed 1 test of 2.
> t/038_save_logical_slots_shutdown.pl .. Dubious, test returned 1
> (wstat 256, 0x100)
> Failed 1/2 subtests
>
> I did a quick analysis of the failure and commit
> https://github.com/postgres/postgres/commit/e0b2eed047df9045664da6f724cb42c10f8b12f0
> that introduced this test. I think the issue is that the slot's
> confirmed_flush LSN (0/1508000) and shutdown checkpoint LSN
> (0/1508018) are not the same:
>
> tmp_check/log/038_save_logical_slots_shutdown_pub.log:
>
> 2024-01-10 07:55:44.539 UTC [57621] sub LOG:  starting logical
> decoding for slot "sub"
> 2024-01-10 07:55:44.539 UTC [57621] sub DETAIL:  Streaming
> transactions committing after 0/1508000, reading WAL from 0/1507FC8.
> 2024-01-10 07:55:44.539 UTC [57621] sub STATEMENT:  START_REPLICATION
> SLOT "sub" LOGICAL 0/0 (proto_version '4', origin 'any',
> publication_names '"pub"')
>
> ubuntu:~/postgres$ pg17/bin/pg_controldata -D
> src/test/recovery/tmp_check/t_038_save_logical_slots_shutdown_pub_data/pgdata/
> Database cluster state:               in production
> pg_control last modified:             Wed Jan 10 07:55:44 2024
> Latest checkpoint location:           0/1508018
> Latest checkpoint's REDO location:    0/1508018
>
> But the tests added by t/038_save_logical_slots_shutdown.pl expects
> both LSNs to be same:
>
> sub compare_confirmed_flush
> {
>     # Is it same as the value read from log?
>     ok( $latest_checkpoint eq $confirmed_flush_from_log,
>         "Check that the slot's confirmed_flush LSN is the same as the
> latest_checkpoint location"
>     );
>
> I suspect that it's quite not right to expect the slot's
> confirmed_flush and latest checkpoint location to be same in the test.
>


As per my understanding, the reason we expect them to be the same is
because we ensure that during shutdown, the walsender sends all the
WAL just before shutdown_checkpoint and the confirm_flush also points
to the end of WAL record before shutodwn_checkpoint. So, the next
starting location should be of shutdown_checkpoint record which should
ideally be the same. Do you see this failure consistently?

-- 
With Regards,
Amit Kapila.

Re: A failure in t/038_save_logical_slots_shutdown.pl

Reply via email to