Re: pg_waldump: support decoding of WAL inside tarfile

Andrew Dunstan Tue, 03 Mar 2026 16:38:17 -0800


On 2026-03-02 Mo 8:00 AM, Amul Sul wrote:

On Wed, Feb 18, 2026 at 12:28 PM Amul Sul <[email protected]> wrote:

On Tue, Feb 10, 2026 at 3:06 PM Amul Sul <[email protected]> wrote:

On Wed, Feb 4, 2026 at 6:39 PM Amul Sul <[email protected]> wrote:

On Wed, Jan 28, 2026 at 2:41 AM Robert Haas <[email protected]> wrote:

On Tue, Jan 27, 2026 at 7:07 AM Amul Sul <[email protected]> wrote:

In the attached version, I am using the WAL segment name as the hash
key, which is much more straightforward. I have rewritten
read_archive_wal_page(), and it looks much cleaner than before. The
logic to discard irrelevant WAL files is still within
get_archive_wal_entry. I added an explanation for setting cur_wal to
NULL, which is now handled in the separate function I mentioned
previously.


Kindly have a look at the attached version; let me know if you are
still not happy with the current approach for filtering/discarding
irrelevant WAL segments. It isn't much different from the previous
version, but I have tried to keep it in a separate routine for better
code readability, with comments to make it easier to understand. I
also added a comment for ArchivedWALFile.

I feel like the division of labor between get_archive_wal_entry() and
read_archive_wal_page() is odd. I noticed this in the last version,
too, and it still seems to be the case. get_archive_wal_entry() first
calls ArchivedWAL_lookup(). If that finds an entry, it just returns.
If it doesn't, it loops until an entry for the requested file shows up
and then returns it. Then control returns to read_archive_wal_page()
which loops some more until we have all the data we need for the
requested file. But it seems odd to me to have two separate loops
here. I think that the first loop is going to call read_archive_file()
until we find the beginning of the file that we care about and then
the second one is going to call read_archive_file() some more until we
have read enough of it to satisfy the request. It feels odd to me to
do it that way, as if we told somebody to first wait until 9 o'clock
and then wait another 30 minutes, instead of just telling them to wait
until 9:30. I realize it's not quite the same thing, because apart
from calling read_archive_file(), the two loops do different things,
but I still think it looks odd.

+ /*
+ * Ignore if the timeline is different or the current segment is not
+ * the desired one.
+ */
+ XLogFromFileName(entry->fname, &curSegTimeline, &curSegNo, WalSegSz);
+ if (privateInfo->timeline != curSegTimeline ||
+ privateInfo->startSegNo > curSegNo ||
+ privateInfo->endSegNo < curSegNo ||
+ segno > curSegNo)
+ {
+ free_archive_wal_entry(entry->fname, privateInfo);
+ continue;
+ }

The comment doesn't match the code. If it did, the test would be
(privateInfo->timeline != curSegTimeline || segno != curSegno). But
instead the segno test is > rather than !=, and the checks against
startSegNo and endSegNo aren't explained at all. I think I understand
why the segno test uses > rather than !=, but it's the point of the
comment to explain things like that, rather than leaving the reader to
guess. And I don't know why we also need to test startSegNo and
endSegNo.

I also wonder what the point is of doing XLogFromFileName() on the
fname provided by the caller and then again on entry->fname. Couldn't
you just compare the strings?

Again, the division of labor is really odd here. It's the job of
astreamer_waldump_content() to skip things that aren't WAL files at
all, but it's the job of get_archive_wal_entry() to skip things that
are WAL files but not the one we want. I disagree with putting those
checks in completely separate parts of the code.

Keeping the timeline and segment start-end range checks inside the
archive streamer creates a circular dependency that cannot be resolved
without a 'dirty hack'. We must read the first available WAL file page
to determine the wal_segment_size before it can calculate the target
segment range. Moving the checks inside the streamer would make it
impossible to process that initial file, as the necessary filtering
parameters --  would still be unknown which would need to be skipped
for the first read somehow. What if later we realized that the first
WAL file which was allowed to be streamed by skipping that check is
irrelevant and doesn't fall under the start-end segment range?

Please have a look at the attached version, specifically patch 0005.
In astreamer_waldump_content(), I have moved the WAL file filtration
check from get_archive_wal_entry(). This check will be skipped during
the initial read in init_archive_reader(), which instead performs it
explicitly once it determines the WAL segment size and the start/end
segments.

To access the WAL segment size inside astreamer_waldump_content(), I
have moved the WAL segment size variable into the XLogDumpPrivate
structure in the separate 0004 patch.

Attached is an updated version including the aforesaid changes. It
includes a new refactoring patch (0001) that moves the logic for
identifying tar archives and their compression types from
pg_basebackup and pg_verifybackup into a separate-reusable function,
per a suggestion from Euler [1].  Additionally, I have added a test
for the contrecord decoding to the main patch (now 0006).

1] http://postgr.es/m/[email protected]

Rebased against the latest master, fixed typos in code comments, and
replaced palloc0 with palloc0_object.


Hi Amul.


I think this looks in pretty good shape.

Attached are patches for a few things I think could be fixed. They aremostly self-explanatory. The TAP test fix is the only sane way I couldcome up with stopping the skip code you had from reporting a wildlyinaccurate number of tests skipped. The sane way to do this from aTest::More perspective is a subtest, but unfortunately meson does notlike subtest output, which is why we don't use it elsewhere, so the onlyway I could come up with was to split this out into a separate test. Ofcourse, we might just say we don't care about the misreport, in whichcase we could just live with things as they are.



cheers


andrew



--
Andrew Dunstan
EDB: https://www.enterprisedb.com

From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Andrew Dunstan <[email protected]>
Date: Tue, 3 Mar 2026 00:00:00 +0000
Subject: [PATCH] Add pg_verifybackup test for tar-format WAL verification

The new tar-format WAL verification in pg_verifybackup had no test
coverage for the case where pg_basebackup produces a separate
pg_wal.tar (--format=tar --wal-method=stream).  Add a test that takes
a tar-format backup and verifies it.
---
 src/bin/pg_verifybackup/t/007_wal.pl | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/src/bin/pg_verifybackup/t/007_wal.pl 
b/src/bin/pg_verifybackup/t/007_wal.pl
index 8ad2234453d..0e0377bfacc 100644
--- a/src/bin/pg_verifybackup/t/007_wal.pl
+++ b/src/bin/pg_verifybackup/t/007_wal.pl
@@ -90,4 +90,20 @@ command_ok(
        [ 'pg_verifybackup', $backup_path2 ],
        'valid base backup with timeline > 1');
 
+# Test WAL verification for a tar-format backup with a separate pg_wal.tar,
+# as produced by pg_basebackup --format=tar --wal-method=stream.
+my $backup_path3 = $primary->backup_dir . '/test_tar_wal';
+$primary->command_ok(
+       [
+               'pg_basebackup',
+               '--pgdata' => $backup_path3,
+               '--no-sync',
+               '--format' => 'tar',
+               '--checkpoint' => 'fast'
+       ],
+       "tar backup with separate pg_wal.tar");
+command_ok(
+       [ 'pg_verifybackup', $backup_path3 ],
+       'WAL verification succeeds with separate pg_wal.tar');
+
 done_testing();

From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Andrew Dunstan <[email protected]>
Date: Tue, 3 Mar 2026 00:00:00 +0000
Subject: [PATCH] Split pg_waldump TAP tests into directory and archive files

The original 001_basic.pl mixed directory and tar archive tests in a
single SKIP loop with a hardcoded skip count of 3, but each scenario
actually runs ~19 assertions.  When tar is unavailable the skip count
was wrong, and the directory scenario was also wrongly guarded by the
tar-availability check.

Move all archive-related tests (tar, tar.gz) into a new
003_archive.pl that uses plan skip_all when tar is unavailable,
cleanly skipping the entire file.  001_basic.pl retains only
directory-based tests with no SKIP blocks needed.
---
 src/bin/pg_waldump/meson.build       |   1 +
 src/bin/pg_waldump/t/001_basic.pl    | 221 ++++++++++-----------------
 src/bin/pg_waldump/t/003_archive.pl  | 320 +++++++++++++++++++++++++++++++++++
 3 files changed, 396 insertions(+), 146 deletions(-)
 create mode 100644 src/bin/pg_waldump/t/003_archive.pl

diff --git a/src/bin/pg_waldump/meson.build b/src/bin/pg_waldump/meson.build
index 5296f21b82c..d2b4bd0c048 100644
--- a/src/bin/pg_waldump/meson.build
+++ b/src/bin/pg_waldump/meson.build
@@ -34,6 +34,7 @@ tests += {
     'tests': [
       't/001_basic.pl',
       't/002_save_fullpage.pl',
+      't/003_archive.pl',
     ],
   },
 }
diff --git a/src/bin/pg_waldump/t/001_basic.pl 
b/src/bin/pg_waldump/t/001_basic.pl
index 9854c939007..282c9a37221 100644
--- a/src/bin/pg_waldump/t/001_basic.pl
+++ b/src/bin/pg_waldump/t/001_basic.pl
@@ -3,13 +3,9 @@
 
 use strict;
 use warnings FATAL => 'all';
-use Cwd;
 use PostgreSQL::Test::Cluster;
 use PostgreSQL::Test::Utils;
 use Test::More;
-use List::Util qw(shuffle);
-
-my $tar = $ENV{TAR};
 
 program_help_ok('pg_waldump');
 program_version_ok('pg_waldump');
@@ -195,8 +191,8 @@ END
 $$;
 });
 
-my $contrecord_lsn = $node->safe_psql('postgres',
-       'SELECT pg_current_wal_insert_lsn()');
+my $contrecord_lsn =
+  $node->safe_psql('postgres', 'SELECT pg_current_wal_insert_lsn()');
 # Generate contrecord record
 $node->safe_psql('postgres',
        qq{SELECT pg_logical_emit_message(true, 'test 026', repeat('xyzxz', 
123456))}
@@ -299,145 +295,78 @@ sub test_pg_waldump
        return @lines;
 }
 
-# Create a tar archive, sorting the file order
-sub generate_archive
-{
-       my ($archive, $directory, $compression_flags) = @_;
-
-       my @files;
-       opendir my $dh, $directory or die "opendir: $!";
-       while (my $entry = readdir $dh) {
-               # Skip '.' and '..'
-               next if $entry eq '.' || $entry eq '..';
-               push @files, $entry;
-       }
-       closedir $dh;
-
-       @files = shuffle @files;
-
-       # move into the WAL directory before archiving files
-       my $cwd = getcwd;
-       chdir($directory) || die "chdir: $!";
-       command_ok([$tar, $compression_flags, $archive, @files]);
-       chdir($cwd) || die "chdir: $!";
-}
-
-my $tmp_dir = PostgreSQL::Test::Utils::tempdir_short();
-
-my @scenarios = (
-       {
-               'path' => $node->data_dir,
-               'is_archive' => 0,
-               'enabled' => 1
-       },
-       {
-               'path' => "$tmp_dir/pg_wal.tar",
-               'compression_method' => 'none',
-               'compression_flags' => '-cf',
-               'is_archive' => 1,
-               'enabled' => 1
-       },
-       {
-               'path' => "$tmp_dir/pg_wal.tar.gz",
-               'compression_method' => 'gzip',
-               'compression_flags' => '-czf',
-               'is_archive' => 1,
-               'enabled' => check_pg_config("#define HAVE_LIBZ 1")
-       });
-
-for my $scenario (@scenarios)
-{
-       my $path = $scenario->{'path'};
-
-       SKIP:
-       {
-               skip "tar command is not available", 3
-                 if !defined $tar;
-               skip "$scenario->{'compression_method'} compression not 
supported by this build", 3
-                 if !$scenario->{'enabled'} && $scenario->{'is_archive'};
-
-                 # create pg_wal archive
-                 if ($scenario->{'is_archive'})
-                 {
-                         generate_archive($path,
-                                 $node->data_dir . '/pg_wal',
-                                 $scenario->{'compression_flags'});
-                 }
-
-               command_fails_like(
-                       [ 'pg_waldump', '--path' => $path ],
-                       qr/error: no start WAL location given/,
-                       'path option requires start location');
-               command_like(
-                       [
-                               'pg_waldump',
-                               '--path' => $path,
-                               '--start' => $start_lsn,
-                               '--end' => $end_lsn,
-                       ],
-                       qr/./,
-                       'runs with path option and start and end locations');
-               command_fails_like(
-                       [
-                               'pg_waldump',
-                               '--path' => $path,
-                               '--start' => $start_lsn,
-                       ],
-                       qr/error: error in WAL record at/,
-                       'falling off the end of the WAL results in an error');
-
-               command_fails_like(
-                       [
-                               'pg_waldump', '--quiet',
-                               '--path' => $path,
-                               '--start' => $start_lsn
-                       ],
-                       qr/error: error in WAL record at/,
-                       'errors are shown with --quiet');
-
-               test_pg_waldump_skip_bytes($path, $start_lsn, $end_lsn);
-
-               my @lines = test_pg_waldump($path, $start_lsn, $end_lsn);
-               is(grep(!/^rmgr: \w/, @lines), 0, 'all output lines are rmgr 
lines');
-
-               @lines = test_pg_waldump($path, $contrecord_lsn, $end_lsn);
-               is(grep(!/^rmgr: \w/, @lines), 0, 'all output lines are rmgr 
lines');
-
-               test_pg_waldump_skip_bytes($path, $contrecord_lsn, $end_lsn);
-
-               @lines = test_pg_waldump($path, $start_lsn, $end_lsn, '--limit' 
=> 6);
-               is(@lines, 6, 'limit option observed');
-
-               @lines = test_pg_waldump($path, $start_lsn, $end_lsn, 
'--fullpage');
-               is(grep(!/^rmgr:.*\bFPW\b/, @lines), 0, 'all output lines are 
FPW');
-
-               @lines = test_pg_waldump($path, $start_lsn, $end_lsn, 
'--stats');
-               like($lines[0], qr/WAL statistics/, "statistics on stdout");
-               is(grep(/^rmgr:/, @lines), 0, 'no rmgr lines output');
-
-               @lines = test_pg_waldump($path, $start_lsn, $end_lsn, 
'--stats=record');
-               like($lines[0], qr/WAL statistics/, "statistics on stdout");
-               is(grep(/^rmgr:/, @lines), 0, 'no rmgr lines output');
-
-               @lines = test_pg_waldump($path, $start_lsn, $end_lsn, '--rmgr' 
=> 'Btree');
-               is(grep(!/^rmgr: Btree/, @lines), 0, 'only Btree lines');
-
-               @lines = test_pg_waldump($path, $start_lsn, $end_lsn, '--fork' 
=> 'init');
-               is(grep(!/fork init/, @lines), 0, 'only init fork lines');
-
-               @lines = test_pg_waldump($path, $start_lsn, $end_lsn,
-                       '--relation' => 
"$default_ts_oid/$postgres_db_oid/$rel_t1_oid");
-               is(grep(!/rel $default_ts_oid\/$postgres_db_oid\/$rel_t1_oid/, 
@lines),
-                       0, 'only lines for selected relation');
-
-               @lines = test_pg_waldump($path, $start_lsn, $end_lsn,
-                       '--relation' => 
"$default_ts_oid/$postgres_db_oid/$rel_i1a_oid",
-                       '--block' => 1);
-               is(grep(!/\bblk 1\b/, @lines), 0, 'only lines for selected 
block');
-
-               # Cleanup.
-               unlink $path if $scenario->{'is_archive'};
-       }
-}
+my $path = $node->data_dir;
+
+command_fails_like(
+       [ 'pg_waldump', '--path' => $path ],
+       qr/error: no start WAL location given/,
+       'path option requires start location');
+command_like(
+       [
+               'pg_waldump',
+               '--path' => $path,
+               '--start' => $start_lsn,
+               '--end' => $end_lsn,
+       ],
+       qr/./,
+       'runs with path option and start and end locations');
+command_fails_like(
+       [
+               'pg_waldump',
+               '--path' => $path,
+               '--start' => $start_lsn,
+       ],
+       qr/error: error in WAL record at/,
+       'falling off the end of the WAL results in an error');
+
+command_fails_like(
+       [
+               'pg_waldump', '--quiet',
+               '--path' => $path,
+               '--start' => $start_lsn
+       ],
+       qr/error: error in WAL record at/,
+       'errors are shown with --quiet');
+
+test_pg_waldump_skip_bytes($path, $start_lsn, $end_lsn);
+
+my @lines = test_pg_waldump($path, $start_lsn, $end_lsn);
+is(grep(!/^rmgr: \w/, @lines), 0, 'all output lines are rmgr lines');
+
+@lines = test_pg_waldump($path, $contrecord_lsn, $end_lsn);
+is(grep(!/^rmgr: \w/, @lines), 0, 'all output lines are rmgr lines');
+
+test_pg_waldump_skip_bytes($path, $contrecord_lsn, $end_lsn);
+
+@lines = test_pg_waldump($path, $start_lsn, $end_lsn, '--limit' => 6);
+is(@lines, 6, 'limit option observed');
+
+@lines = test_pg_waldump($path, $start_lsn, $end_lsn, '--fullpage');
+is(grep(!/^rmgr:.*\bFPW\b/, @lines), 0, 'all output lines are FPW');
+
+@lines = test_pg_waldump($path, $start_lsn, $end_lsn, '--stats');
+like($lines[0], qr/WAL statistics/, "statistics on stdout");
+is(grep(/^rmgr:/, @lines), 0, 'no rmgr lines output');
+
+@lines = test_pg_waldump($path, $start_lsn, $end_lsn, '--stats=record');
+like($lines[0], qr/WAL statistics/, "statistics on stdout");
+is(grep(/^rmgr:/, @lines), 0, 'no rmgr lines output');
+
+@lines = test_pg_waldump($path, $start_lsn, $end_lsn, '--rmgr' => 'Btree');
+is(grep(!/^rmgr: Btree/, @lines), 0, 'only Btree lines');
+
+@lines = test_pg_waldump($path, $start_lsn, $end_lsn, '--fork' => 'init');
+is(grep(!/fork init/, @lines), 0, 'only init fork lines');
+
+@lines = test_pg_waldump($path, $start_lsn, $end_lsn,
+       '--relation' => "$default_ts_oid/$postgres_db_oid/$rel_t1_oid");
+is(grep(!/rel $default_ts_oid\/$postgres_db_oid\/$rel_t1_oid/, @lines),
+       0, 'only lines for selected relation');
+
+@lines = test_pg_waldump(
+       $path, $start_lsn, $end_lsn,
+       '--relation' => "$default_ts_oid/$postgres_db_oid/$rel_i1a_oid",
+       '--block' => 1);
+is(grep(!/\bblk 1\b/, @lines), 0, 'only lines for selected block');
 
 done_testing();
new file mode 100644
index 00000000000..c615713efd4
--- /dev/null
+++ b/src/bin/pg_waldump/t/003_archive.pl
@@ -0,0 +1,320 @@
+
+# Copyright (c) 2021-2026, PostgreSQL Global Development Group
+
+# Test pg_waldump's ability to read WAL from tar archives.
+
+use strict;
+use warnings FATAL => 'all';
+use Cwd;
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+use List::Util qw(shuffle);
+
+my $tar = $ENV{TAR};
+
+if (!defined $tar)
+{
+       plan skip_all => 'tar command is not available';
+}
+
+my $node = PostgreSQL::Test::Cluster->new('main');
+$node->init;
+$node->append_conf(
+       'postgresql.conf', q{
+autovacuum = off
+checkpoint_timeout = 1h
+
+# for standbydesc
+archive_mode=on
+archive_command=''
+
+# for XLOG_HEAP_TRUNCATE
+wal_level=logical
+});
+$node->start;
+
+my ($start_lsn, $start_walfile) = split /\|/,
+  $node->safe_psql('postgres',
+       q{SELECT pg_current_wal_insert_lsn(), 
pg_walfile_name(pg_current_wal_insert_lsn())}
+  );
+
+$node->safe_psql(
+       'postgres', q{
+-- heap, btree, hash, sequence
+CREATE TABLE t1 (a int GENERATED ALWAYS AS IDENTITY, b text);
+CREATE INDEX i1a ON t1 USING btree (a);
+CREATE INDEX i1b ON t1 USING hash (b);
+INSERT INTO t1 VALUES (default, 'one'), (default, 'two');
+DELETE FROM t1 WHERE b = 'one';
+TRUNCATE t1;
+
+-- abort
+START TRANSACTION;
+INSERT INTO t1 VALUES (default, 'three');
+ROLLBACK;
+
+-- unlogged/init fork
+CREATE UNLOGGED TABLE t2 (x int);
+CREATE INDEX i2 ON t2 USING btree (x);
+INSERT INTO t2 SELECT generate_series(1, 10);
+
+-- gin
+CREATE TABLE gin_idx_tbl (id bigserial PRIMARY KEY, data jsonb);
+CREATE INDEX gin_idx ON gin_idx_tbl USING gin (data);
+INSERT INTO gin_idx_tbl
+    WITH random_json AS (
+        SELECT json_object_agg(key, trunc(random() * 10)) as json_data
+            FROM unnest(array['a', 'b', 'c']) as u(key))
+          SELECT generate_series(1,500), json_data FROM random_json;
+
+-- gist, spgist
+CREATE TABLE gist_idx_tbl (p point);
+CREATE INDEX gist_idx ON gist_idx_tbl USING gist (p);
+CREATE INDEX spgist_idx ON gist_idx_tbl USING spgist (p);
+INSERT INTO gist_idx_tbl (p) VALUES (point '(1, 1)'), (point '(3, 2)'), (point 
'(6, 3)');
+
+-- brin
+CREATE TABLE brin_idx_tbl (col1 int, col2 text, col3 text );
+CREATE INDEX brin_idx ON brin_idx_tbl USING brin (col1, col2, col3) WITH 
(autosummarize=on);
+INSERT INTO brin_idx_tbl SELECT generate_series(1, 10000), 'dummy', 'dummy';
+UPDATE brin_idx_tbl SET col2 = 'updated' WHERE col1 BETWEEN 1 AND 5000;
+SELECT brin_summarize_range('brin_idx', 0);
+SELECT brin_desummarize_range('brin_idx', 0);
+
+VACUUM;
+
+-- logical message
+SELECT pg_logical_emit_message(true, 'foo', 'bar');
+
+-- relmap
+VACUUM FULL pg_authid;
+
+-- database
+CREATE DATABASE d1;
+DROP DATABASE d1;
+});
+
+my $tblspc_path = PostgreSQL::Test::Utils::tempdir_short();
+
+$node->safe_psql(
+       'postgres', qq{
+CREATE TABLESPACE ts1 LOCATION '$tblspc_path';
+DROP TABLESPACE ts1;
+});
+
+# Consume all remaining room in the current WAL segment, leaving space enough
+# only for the start of a largish record, to test contrecord decoding.
+$node->safe_psql(
+       'postgres', q{
+DO $$
+DECLARE
+    wal_segsize int := setting::int FROM pg_settings WHERE name = 
'wal_segment_size';
+    remain int;
+    iters  int := 0;
+BEGIN
+    LOOP
+        INSERT into t1(b)
+        select repeat(encode(sha256(g::text::bytea), 'hex'), (random() * 15 + 
1)::int)
+        from generate_series(1, 10) g;
+
+        remain := wal_segsize - (pg_current_wal_insert_lsn() - '0/0') % 
wal_segsize;
+        IF remain < 2 * setting::int from pg_settings where name = 
'block_size' THEN
+            RAISE log 'exiting after % iterations, % bytes to end of WAL 
segment', iters, remain;
+            EXIT;
+        END IF;
+        iters := iters + 1;
+    END LOOP;
+END
+$$;
+});
+
+my $contrecord_lsn =
+  $node->safe_psql('postgres', 'SELECT pg_current_wal_insert_lsn()');
+$node->safe_psql('postgres',
+       qq{SELECT pg_logical_emit_message(true, 'test 026', repeat('xyzxz', 
123456))}
+);
+
+my ($end_lsn, $end_walfile) = split /\|/,
+  $node->safe_psql('postgres',
+       q{SELECT pg_current_wal_insert_lsn(), 
pg_walfile_name(pg_current_wal_insert_lsn())}
+  );
+
+$node->stop;
+
+
+sub test_pg_waldump_skip_bytes
+{
+       my ($path, $startlsn, $endlsn) = @_;
+
+       my ($part1, $part2) = split qr{/}, $startlsn;
+       my $lsn2 = hex $part2;
+       $lsn2++;
+       my $new_start = sprintf("%s/%X", $part1, $lsn2);
+
+       my ($stdout, $stderr);
+
+       my $result = IPC::Run::run [
+               'pg_waldump',
+               '--start' => $new_start,
+               '--end' => $endlsn,
+               '--path' => $path,
+         ],
+         '>' => \$stdout,
+         '2>' => \$stderr;
+       ok($result, "runs with start segment and start LSN specified");
+       like($stderr, qr/first record is after/, 'info message printed');
+}
+
+sub test_pg_waldump
+{
+       local $Test::Builder::Level = $Test::Builder::Level + 1;
+       my ($path, $startlsn, $endlsn, @opts) = @_;
+
+       my ($stdout, $stderr);
+
+       my $result = IPC::Run::run [
+               'pg_waldump',
+               '--start' => $startlsn,
+               '--end' => $endlsn,
+               '--path' => $path,
+               @opts
+         ],
+         '>' => \$stdout,
+         '2>' => \$stderr;
+       ok($result, "pg_waldump @opts: runs ok");
+       is($stderr, '', "pg_waldump @opts: no stderr");
+       my @lines = split /\n/, $stdout;
+       ok(@lines > 0, "pg_waldump @opts: some lines are output");
+       return @lines;
+}
+
+sub generate_archive
+{
+       my ($archive, $directory, $compression_flags) = @_;
+
+       my @files;
+       opendir my $dh, $directory or die "opendir: $!";
+       while (my $entry = readdir $dh)
+       {
+               next if $entry eq '.' || $entry eq '..';
+               push @files, $entry;
+       }
+       closedir $dh;
+
+       @files = shuffle @files;
+
+       my $cwd = getcwd;
+       chdir($directory) || die "chdir: $!";
+       command_ok([ $tar, $compression_flags, $archive, @files ],
+               "create archive $archive");
+       chdir($cwd) || die "chdir: $!";
+}
+
+
+my $tmp_dir = PostgreSQL::Test::Utils::tempdir_short();
+
+my @scenarios = (
+       {
+               'path' => "$tmp_dir/pg_wal.tar",
+               'compression_method' => 'none',
+               'compression_flags' => '-cf',
+               'enabled' => 1,
+       },
+       {
+               'path' => "$tmp_dir/pg_wal.tar.gz",
+               'compression_method' => 'gzip',
+               'compression_flags' => '-czf',
+               'enabled' => check_pg_config("#define HAVE_LIBZ 1"),
+       });
+
+for my $scenario (@scenarios)
+{
+       my $path = $scenario->{'path'};
+       my $method = $scenario->{'compression_method'};
+
+  SKIP:
+       {
+               skip "$method compression not supported by this build", 1
+                 if !$scenario->{'enabled'};
+
+               generate_archive(
+                       $path,
+                       $node->data_dir . '/pg_wal',
+                       $scenario->{'compression_flags'});
+
+               command_fails_like(
+                       [ 'pg_waldump', '--path' => $path ],
+                       qr/error: no start WAL location given/,
+                       "$method: path option requires start location");
+               command_like(
+                       [
+                               'pg_waldump',
+                               '--path' => $path,
+                               '--start' => $start_lsn,
+                               '--end' => $end_lsn,
+                       ],
+                       qr/./,
+                       "$method: runs with path option and start and end 
locations");
+               command_fails_like(
+                       [
+                               'pg_waldump',
+                               '--path' => $path,
+                               '--start' => $start_lsn,
+                       ],
+                       qr/error: error in WAL record at/,
+                       "$method: falling off the end of the WAL results in an 
error");
+
+               command_fails_like(
+                       [
+                               'pg_waldump', '--quiet',
+                               '--path' => $path,
+                               '--start' => $start_lsn
+                       ],
+                       qr/error: error in WAL record at/,
+                       "$method: errors are shown with --quiet");
+
+               test_pg_waldump_skip_bytes($path, $start_lsn, $end_lsn);
+
+               my @lines = test_pg_waldump($path, $start_lsn, $end_lsn);
+               is(grep(!/^rmgr: \w/, @lines),
+                       0, "$method: all output lines are rmgr lines");
+
+               @lines = test_pg_waldump($path, $contrecord_lsn, $end_lsn);
+               is(grep(!/^rmgr: \w/, @lines),
+                       0, "$method: contrecord - all output lines are rmgr 
lines");
+
+               test_pg_waldump_skip_bytes($path, $contrecord_lsn, $end_lsn);
+
+               @lines = test_pg_waldump($path, $start_lsn, $end_lsn, '--limit' 
=> 6);
+               is(@lines, 6, "$method: limit option observed");
+
+               @lines = test_pg_waldump($path, $start_lsn, $end_lsn, 
'--fullpage');
+               is(grep(!/^rmgr:.*\bFPW\b/, @lines),
+                       0, "$method: all output lines are FPW");
+
+               @lines = test_pg_waldump($path, $start_lsn, $end_lsn, 
'--stats');
+               like($lines[0], qr/WAL statistics/, "$method: statistics on 
stdout");
+               is(grep(/^rmgr:/, @lines), 0, "$method: no rmgr lines output");
+
+               @lines =
+                 test_pg_waldump($path, $start_lsn, $end_lsn, 
'--stats=record');
+               like($lines[0], qr/WAL statistics/,
+                       "$method: stats=record on stdout");
+               is(grep(/^rmgr:/, @lines),
+                       0, "$method: no rmgr lines with stats=record");
+
+               @lines =
+                 test_pg_waldump($path, $start_lsn, $end_lsn, '--rmgr' => 
'Btree');
+               is(grep(!/^rmgr: Btree/, @lines), 0, "$method: only Btree 
lines");
+
+               @lines =
+                 test_pg_waldump($path, $start_lsn, $end_lsn, '--fork' => 
'init');
+               is(grep(!/fork init/, @lines), 0, "$method: only init fork 
lines");
+
+               # Cleanup.
+               unlink $path;
+       }
+}
+
+done_testing();

From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Andrew Dunstan <[email protected]>
Date: Tue, 3 Mar 2026 00:00:00 +0000
Subject: [PATCH] Fix documentation for pg_waldump tar archive support

Two documentation issues with the tar archive reading feature:

- pg_waldump.sgml: When reading WAL from a tar archive with
  out-of-order segments, pg_waldump spills to temporary files.  TMPDIR
  controls where those files are created, but this was not documented
  in the Environment section.

- pg_verifybackup.sgml: The --wal-path option description still only
  said "directory" even though it now also accepts tar archives.
---
 doc/src/sgml/ref/pg_verifybackup.sgml |  7 ++++---
 doc/src/sgml/ref/pg_waldump.sgml      | 11 +++++++++++
 2 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/doc/src/sgml/ref/pg_verifybackup.sgml 
b/doc/src/sgml/ref/pg_verifybackup.sgml
index 16b50b5a4df..1695cfe91c8 100644
--- a/doc/src/sgml/ref/pg_verifybackup.sgml
+++ b/doc/src/sgml/ref/pg_verifybackup.sgml
@@ -261,9 +261,10 @@ PostgreSQL documentation
       <term><option>--wal-path=<replaceable 
class="parameter">path</replaceable></option></term>
       <listitem>
        <para>
-        Try to parse WAL files stored in the specified directory, rather than
-        in <literal>pg_wal</literal>. This may be useful if the backup is
-        stored in a separate location from the WAL archive.
+        Try to parse WAL files stored in the specified directory or tar
+        archive, rather than in <literal>pg_wal</literal>. This may be
+        useful if the backup is stored in a separate location from the WAL
+        archive.
        </para>
       </listitem>
      </varlistentry>
diff --git a/doc/src/sgml/ref/pg_waldump.sgml b/doc/src/sgml/ref/pg_waldump.sgml
index b36323dde92..9bbb4bd5772 100644
--- a/doc/src/sgml/ref/pg_waldump.sgml
+++ b/doc/src/sgml/ref/pg_waldump.sgml
@@ -391,6 +391,17 @@ PostgreSQL documentation
      </para>
     </listitem>
    </varlistentry>
+
+   <varlistentry>
+    <term><envar>TMPDIR</envar></term>
+    <listitem>
+     <para>
+      Directory in which to create temporary files when reading WAL from a
+      tar archive with out-of-order segment files. If not set, the temporary
+      directory is created within the same directory as the tar archive.
+     </para>
+    </listitem>
+   </varlistentry>
   </variablelist>
  </refsect1>

From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Andrew Dunstan <[email protected]>
Date: Tue, 3 Mar 2026 00:00:00 +0000
Subject: [PATCH] Fix bugs in pg_waldump tar archive support

Fix several bugs introduced by the pg_waldump archive WAL reading
feature:

- pg_waldump.c: The error path for verify_directory() printed waldir
  (which is NULL when --path is used) instead of walpath.

- archive_waldump.c: The error message for short reads had an operator
  precedence bug: (long long int) count - nbytes cast only count, not
  the subtraction result.  Also reported nbytes (the requested amount)
  instead of count (the total file size) for the "of" portion.

- archive_waldump.c: The "ignoring duplicate WAL" code path leaked
  fname (allocated via pnstrdup/palloc).  Also changed the existing
  free(fname) to pfree(fname) for consistency.

- pg_verifybackup.c: The rename from --wal-directory to --wal-path
  didn't preserve the old spelling as a backward-compatible alias.

- pg_verifybackup.c: Fix double space before "Or" in --wal-path
  error hint message.
---
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c 
b/src/bin/pg_verifybackup/pg_verifybackup.c
index 935ab8fafa8..b0b764913cf 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -131,6 +131,7 @@ main(int argc, char **argv)
                {"quiet", no_argument, NULL, 'q'},
                {"skip-checksums", no_argument, NULL, 's'},
                {"wal-path", required_argument, NULL, 'w'},
+               {"wal-directory", required_argument, NULL, 'w'},
                {NULL, 0, NULL, 0}
        };
 
@@ -376,7 +377,7 @@ main(int argc, char **argv)
                else
                {
                        pg_log_error("WAL archive not found");
-                       pg_log_error_hint("Specify the correct path using the 
option -w/--wal-path.  "
+                       pg_log_error_hint("Specify the correct path using the 
option -w/--wal-path. "
                                                          "Or you must use 
-n/--no-parse-wal when verifying a tar-format backup.");
                        exit(1);
                }
diff --git a/src/bin/pg_waldump/archive_waldump.c 
b/src/bin/pg_waldump/archive_waldump.c
index c5a4485b5b1..1479efe61f5 100644
--- a/src/bin/pg_waldump/archive_waldump.c
+++ b/src/bin/pg_waldump/archive_waldump.c
@@ -344,8 +344,8 @@ read_archive_wal_page(XLogDumpPrivate *privateInfo, 
XLogRecPtr targetPagePtr,
                                read_archive_file(privateInfo, READ_CHUNK_SIZE) 
== 0)
                                pg_fatal("could not read file \"%s\" from 
archive \"%s\": read %lld of %lld",
                                                 fname, 
privateInfo->archive_name,
-                                                (long long int) count - nbytes,
-                                                (long long int) nbytes);
+                                                (long long int) (count - 
nbytes),
+                                                (long long int) count);
                }
        }
 
@@ -664,7 +664,7 @@ astreamer_waldump_content(astreamer *streamer, 
astreamer_member *member,
                                                privateInfo->start_segno > 
segno ||
                                                privateInfo->end_segno < segno)
                                        {
-                                               free(fname);
+                                               pfree(fname);
                                                break;
                                        }
                                }
@@ -680,6 +680,7 @@ astreamer_waldump_content(astreamer *streamer, 
astreamer_member *member,
                                {
                                        pg_log_warning("ignoring duplicate WAL 
\"%s\" found in archive \"%s\"",
                                                                   
member->pathname, privateInfo->archive_name);
+                                       pfree(fname);
                                        break;
                                }
 
diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c
index 114969217d8..4b438b53ead 100644
--- a/src/bin/pg_waldump/pg_waldump.c
+++ b/src/bin/pg_waldump/pg_waldump.c
@@ -1223,7 +1223,7 @@ main(int argc, char **argv)
                /* validate path points to directory */
                else if (!verify_directory(walpath))
                {
-                       pg_log_error("could not open directory \"%s\": %m", 
waldir);
+                       pg_log_error("could not open directory \"%s\": %m", 
walpath);
                        goto bad_argument;
                }
        }

Re: pg_waldump: support decoding of WAL inside tarfile

Reply via email to