Hi, Buildfarm identified one issue at [1] where it could not identify a partial WAL record spanning across 2 pages was written due to immediate shutdown. Consider a scenario where a WAL record is split across multiple WAL pages. If the server crashes before the entire WAL record is written, the recovery process detects the incomplete (broken) record. In such cases, a special flag, XLP_FIRST_IS_OVERWRITE_CONTRECORD, is set in the page header to indicate this condition. When we attempt to retrieve changes using a logical replication slot, the system reads WAL data based on the record header. In one specific instance, a total of 8133 bytes were required to reconstruct the WAL record. Of this, 277 bytes were expected to be present in the subsequent WAL page. However, only 248 bytes were available, because no new WAL records had been generated as there were no operations performed on the system after the crash. As a result, the slot_get_changes function continuously checks whether the current WAL flush position has reached 0x29004115 (the offset corresponding to the required 277 bytes in the next WAL page). Since no new WAL is being generated, the flush position always returns 0x290040F8 (the 248-byte offset), causing the function to wait indefinitely at read_local_xlog_page_guts function.
Currently, the logic attempts to read the complete WAL record based on the size obtained before the crash—even though only a partial record was written. It then checks the page header to determine whether the XLP_FIRST_IS_OVERWRITE_CONTRECORD flag is set only after reading the complete WAL record at XLogDecodeNextRecord function, but since that much WAL data was not available in the system we never get a chance to check the header after this.. To address this issue, a more robust approach would be to first read the page header, check if the XLP_FIRST_IS_OVERWRITE_CONTRECORD flag is set, and only then proceed to read the WAL record size if the record is not marked as a partial overwrite. This would prevent the system from waiting for WAL data that will never arrive. Attached partial_wal_record_fix.patch patch for this. I don't have a consistent test to reproduce this issue, currently this issue can be reproduced by running the 046_checkpoint_logical_slot test about 50 times. This test 046_checkpoint_logical_slot was reverted recently after it caused a few buildfarm failures discussed at [2]. The same test is also attached as reverted_test_046_checkpoint_logical_slot.patch. Added members who were involved in the discussion of this issue. Thoughts? [1] - https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=flaviventris&dt=2025-06-19+23%3A47%3A08 [2] - https://www.postgresql.org/message-id/CAPpHfdsO9s5he3xHWNFtwvXtvsftu3nNz%3DPV9fdN32UOh-Z3tA%40mail.gmail.com Regards, Vignesh
diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c index 2790ade1f91..54d6c67a713 100644 --- a/src/backend/access/transam/xlogreader.c +++ b/src/backend/access/transam/xlogreader.c @@ -723,11 +723,14 @@ restart: /* Calculate pointer to beginning of next page */ targetPagePtr += XLOG_BLCKSZ; - /* Wait for the next page to become available */ - readOff = ReadPageInternal(state, targetPagePtr, - Min(total_len - gotlen + SizeOfXLogShortPHD, - XLOG_BLCKSZ)); - + /* + * Read the page header before processing the WAL record data. This + * is necessary to correctly handle cases where the previous record + * ended as a partial record. Attempting to read the full + * record without checking the header may result in waiting + * indefinitely for data that doesn't exist. + */ + readOff = ReadPageInternal(state, targetPagePtr, SizeOfXLogShortPHD); if (readOff == XLREAD_WOULDBLOCK) return XLREAD_WOULDBLOCK; else if (readOff < 0) @@ -761,6 +764,16 @@ restart: goto err; } + /* Wait for the next page to become available */ + readOff = ReadPageInternal(state, targetPagePtr, + Min(total_len - gotlen + SizeOfXLogShortPHD, + XLOG_BLCKSZ)); + + if (readOff == XLREAD_WOULDBLOCK) + return XLREAD_WOULDBLOCK; + else if (readOff < 0) + goto err; + /* * Cross-check that xlp_rem_len agrees with how much of the record * we expect there to be left.
diff --git a/src/test/recovery/t/046_checkpoint_logical_slot.pl b/src/test/recovery/t/046_checkpoint_logical_slot.pl new file mode 100644 index 00000000000..d67c5108d78 --- /dev/null +++ b/src/test/recovery/t/046_checkpoint_logical_slot.pl @@ -0,0 +1,136 @@ +# Copyright (c) 2025, PostgreSQL Global Development Group +# +# This test verifies the case when the logical slot is advanced during +# checkpoint. The test checks that the logical slot's restart_lsn still refers +# to an existed WAL segment after immediate restart. +# +use strict; +use warnings FATAL => 'all'; + +use PostgreSQL::Test::Cluster; +use PostgreSQL::Test::Utils; + +use Test::More; + +if ($ENV{enable_injection_points} ne 'yes') +{ + plan skip_all => 'Injection points not supported by this build'; +} + +my ($node, $result); + +$node = PostgreSQL::Test::Cluster->new('mike'); +$node->init; +$node->append_conf('postgresql.conf', "wal_level = 'logical'"); +$node->start; + +# Check if the extension injection_points is available, as it may be +# possible that this script is run with installcheck, where the module +# would not be installed by default. +if (!$node->check_extension('injection_points')) +{ + plan skip_all => 'Extension injection_points not installed'; +} + +$node->safe_psql('postgres', q(CREATE EXTENSION injection_points)); + +# Create the two slots we'll need. +$node->safe_psql('postgres', + q{select pg_create_logical_replication_slot('slot_logical', 'test_decoding')} +); +$node->safe_psql('postgres', + q{select pg_create_physical_replication_slot('slot_physical', true)}); + +# Advance both slots to the current position just to have everything "valid". +$node->safe_psql('postgres', + q{select count(*) from pg_logical_slot_get_changes('slot_logical', null, null)} +); +$node->safe_psql('postgres', + q{select pg_replication_slot_advance('slot_physical', pg_current_wal_lsn())} +); + +# Run checkpoint to flush current state to disk and set a baseline. +$node->safe_psql('postgres', q{checkpoint}); + +# Generate some transactions to get RUNNING_XACTS. +my $xacts = $node->background_psql('postgres'); +$xacts->query_until( + qr/run_xacts/, + q(\echo run_xacts +SELECT 1 \watch 0.1 +\q +)); + +$node->advance_wal(20); + +# Run another checkpoint to set a new restore LSN. +$node->safe_psql('postgres', q{checkpoint}); + +$node->advance_wal(20); + +# Run another checkpoint, this time in the background, and make it wait +# on the injection point) so that the checkpoint stops right before +# removing old WAL segments. +note('starting checkpoint'); + +my $checkpoint = $node->background_psql('postgres'); +$checkpoint->query_safe( + q(select injection_points_attach('checkpoint-before-old-wal-removal','wait')) +); +$checkpoint->query_until( + qr/starting_checkpoint/, + q(\echo starting_checkpoint +checkpoint; +\q +)); + +# Wait until the checkpoint stops right before removing WAL segments. +note('waiting for injection_point'); +$node->wait_for_event('checkpointer', 'checkpoint-before-old-wal-removal'); +note('injection_point is reached'); + +# Try to advance the logical slot, but make it stop when it moves to the next +# WAL segment (this has to happen in the background, too). +my $logical = $node->background_psql('postgres'); +$logical->query_safe( + q{select injection_points_attach('logical-replication-slot-advance-segment','wait');} +); +$logical->query_until( + qr/get_changes/, + q( +\echo get_changes +select count(*) from pg_logical_slot_get_changes('slot_logical', null, null) \watch 1 +\q +)); + +# Wait until the slot's restart_lsn points to the next WAL segment. +note('waiting for injection_point'); +$node->wait_for_event('client backend', + 'logical-replication-slot-advance-segment'); +note('injection_point is reached'); + +# OK, we're in the right situation: time to advance the physical slot, which +# recalculates the required LSN, and then unblock the checkpoint, which +# removes the WAL still needed by the logical slot. +$node->safe_psql('postgres', + q{select pg_replication_slot_advance('slot_physical', pg_current_wal_lsn())} +); + +# Continue the checkpoint. +$node->safe_psql('postgres', + q{select injection_points_wakeup('checkpoint-before-old-wal-removal')}); + +# Abruptly stop the server (1 second should be enough for the checkpoint +# to finish; it would be better). +$node->stop('immediate'); + +$node->start; + +eval { + $node->safe_psql('postgres', + q{select count(*) from pg_logical_slot_get_changes('slot_logical', null, null);} + ); +}; +is($@, '', "Logical slot still valid"); + +done_testing();