Re: pg_waldump: support decoding of WAL inside tarfile

Andrew Dunstan Fri, 06 Mar 2026 14:21:32 -0800


On 2026-03-04 We 4:50 PM, Andrew Dunstan wrote:



On 2026-03-04 We 7:52 AM, Amul Sul wrote:

On Wed, Mar 4, 2026 at 6:07 AM Andrew Dunstan<[email protected]> wrote:

On 2026-03-02 Mo 8:00 AM, Amul Sul wrote:

On Wed, Feb 18, 2026 at 12:28 PM Amul Sul<[email protected]> wrote:

On Tue, Feb 10, 2026 at 3:06 PM Amul Sul<[email protected]> wrote:

On Wed, Feb 4, 2026 at 6:39 PM Amul Sul<[email protected]> wrote:

On Wed, Jan 28, 2026 at 2:41 AM Robert Haas<[email protected]> wrote:

On Tue, Jan 27, 2026 at 7:07 AM Amul Sul<[email protected]> wrote:

In the attached version, I am using the WAL segment name as the hash
key, which is much more straightforward. I have rewritten
read_archive_wal_page(), and it looks much cleaner than before. The
logic to discard irrelevant WAL files is still within
get_archive_wal_entry. I added an explanation for setting cur_wal to
NULL, which is now handled in the separate function I mentioned
previously.

Kindly have a look at the attached version; let me know if you are
still not happy with the current approach for filtering/discarding
irrelevant WAL segments. It isn't much different from the previous
version, but I have tried to keep it in a separate routine for better
code readability, with comments to make it easier to understand. I
also added a comment for ArchivedWALFile.

I feel like the division of labor between get_archive_wal_entry() and
read_archive_wal_page() is odd. I noticed this in the last version,
too, and it still seems to be the case. get_archive_wal_entry() first
calls ArchivedWAL_lookup(). If that finds an entry, it just returns.
If it doesn't, it loops until an entry for the requested file shows up
and then returns it. Then control returns to read_archive_wal_page()
which loops some more until we have all the data we need for the
requested file. But it seems odd to me to have two separate loops
here. I think that the first loop is going to call read_archive_file()
until we find the beginning of the file that we care about and then
the second one is going to call read_archive_file() some more until we
have read enough of it to satisfy the request. It feels odd to me to
do it that way, as if we told somebody to first wait until 9 o'clock
and then wait another 30 minutes, instead of just telling them to wait
until 9:30. I realize it's not quite the same thing, because apart
from calling read_archive_file(), the two loops do different things,
but I still think it looks odd.

+ /*
+ * Ignore if the timeline is different or the current segment is not
+ * the desired one.
+ */
+ XLogFromFileName(entry->fname, &curSegTimeline, &curSegNo, WalSegSz);
+ if (privateInfo->timeline != curSegTimeline ||
+ privateInfo->startSegNo > curSegNo ||
+ privateInfo->endSegNo < curSegNo ||
+ segno > curSegNo)
+ {
+ free_archive_wal_entry(entry->fname, privateInfo);
+ continue;
+ }

The comment doesn't match the code. If it did, the test would be
(privateInfo->timeline != curSegTimeline || segno != curSegno). But
instead the segno test is > rather than !=, and the checks against
startSegNo and endSegNo aren't explained at all. I think I understand
why the segno test uses > rather than !=, but it's the point of the
comment to explain things like that, rather than leaving the reader to
guess. And I don't know why we also need to test startSegNo and
endSegNo.

I also wonder what the point is of doing XLogFromFileName() on the
fname provided by the caller and then again on entry->fname. Couldn't
you just compare the strings?

Again, the division of labor is really odd here. It's the job of
astreamer_waldump_content() to skip things that aren't WAL files at
all, but it's the job of get_archive_wal_entry() to skip things that
are WAL files but not the one we want. I disagree with putting those
checks in completely separate parts of the code.

Keeping the timeline and segment start-end range checks inside the
archive streamer creates a circular dependency that cannot be resolved
without a 'dirty hack'. We must read the first available WAL file page
to determine the wal_segment_size before it can calculate the target
segment range. Moving the checks inside the streamer would make it
impossible to process that initial file, as the necessary filtering
parameters --  would still be unknown which would need to be skipped
for the first read somehow. What if later we realized that the first
WAL file which was allowed to be streamed by skipping that check is
irrelevant and doesn't fall under the start-end segment range?

Please have a look at the attached version, specifically patch 0005.
In astreamer_waldump_content(), I have moved the WAL file filtration
check from get_archive_wal_entry(). This check will be skipped during
the initial read in init_archive_reader(), which instead performs it
explicitly once it determines the WAL segment size and the start/end
segments.

To access the WAL segment size inside astreamer_waldump_content(), I
have moved the WAL segment size variable into the XLogDumpPrivate
structure in the separate 0004 patch.

Attached is an updated version including the aforesaid changes. It
includes a new refactoring patch (0001) that moves the logic for
identifying tar archives and their compression types from
pg_basebackup and pg_verifybackup into a separate-reusable function,
per a suggestion from Euler [1].  Additionally, I have added a test
for the contrecord decoding to the main patch (now 0006).

1]http://postgr.es/m/[email protected]

Rebased against the latest master, fixed typos in code comments, and
replaced palloc0 with palloc0_object.

Hi Amul.


I think this looks in pretty good shape.

Thank you very much for looking at the patch.

Attached are patches for a few things I think could be fixed. They are
mostly self-explanatory. The TAP test fix is the only sane way I could
come up with stopping the skip code you had from reporting a wildly
inaccurate number of tests skipped. The sane way to do this from a
Test::More perspective is a subtest, but unfortunately meson does not
like subtest output, which is why we don't use it elsewhere, so the only
way I could come up with was to split this out into a separate test. Of
course, we might just say we don't care about the misreport, in which
case we could just live with things as they are.

I agree that the reported skip number was incorrect, and I have
corrected it in the attached patch. I haven't applied your patch for
the TAP test improvements yet because I wanted to double-check it
first with you; the patch as it stood created duplicate tests already
present in 001_basic.pl. To avoid this duplication, I have added a
loop that performs tests for both plain and tar WAL directory inputs,
similar to the approach used in pg_verifybackup for different
compression type tests (e.g., 008_untar.pl, 010_client_untar.pl). I
don't have any objection to doing so if you feel the duplication is
acceptable, but I feel that using a loop for the tests in 001_basic.pl
is a bit tidier. Let me know your thoughts.



I will take a look.

I'm ok, with doing it this way. It's just a bit fragile - if we add atest the number will be wrong. But maybe it's not worth worrying about.

Everything else looks fairly good. The attached fixes a few relativelyminor issues in v15. The main one is that it stops allocating/freeing abuffer every time we call read_archive_file() and instead adds areusable buffer. It also adds back wal-directory as an undocumentedalias of wal-path, to avoid breaking legacy scripts unnecessarily, andadds constness to the fname argument of pg_tar_compress_algorithm, aswell as fixing some indentation and grammar issues.


All in all I think we're in good shape.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com

diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c 
b/src/bin/pg_verifybackup/pg_verifybackup.c
index 935ab8fafa8..db79dd39103 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -131,6 +131,7 @@ main(int argc, char **argv)
                {"quiet", no_argument, NULL, 'q'},
                {"skip-checksums", no_argument, NULL, 's'},
                {"wal-path", required_argument, NULL, 'w'},
+               {"wal-directory", required_argument, NULL, 'w'},        /* 
deprecated */
                {NULL, 0, NULL, 0}
        };
 
diff --git a/src/bin/pg_waldump/archive_waldump.c 
b/src/bin/pg_waldump/archive_waldump.c
index 1479efe61f5..f0cc42b18bf 100644
--- a/src/bin/pg_waldump/archive_waldump.c
+++ b/src/bin/pg_waldump/archive_waldump.c
@@ -236,6 +236,13 @@ free_archive_reader(XLogDumpPrivate *privateInfo)
         */
        astreamer_free(privateInfo->archive_streamer);
 
+       /* Free the reusable read buffer. */
+       if (privateInfo->archive_read_buf != NULL)
+       {
+               pg_free(privateInfo->archive_read_buf);
+               privateInfo->archive_read_buf = NULL;
+       }
+
        /* Close the file. */
        if (close(privateInfo->archive_fd) != 0)
                pg_log_error("could not close file \"%s\": %m",
@@ -350,17 +357,17 @@ read_archive_wal_page(XLogDumpPrivate *privateInfo, 
XLogRecPtr targetPagePtr,
        }
 
        /*
-        * Should have either have successfully read all the requested bytes or
-        * reported a failure before this point.
+        * Should have successfully read all the requested bytes or reported a
+        * failure before this point.
         */
        Assert(nbytes == 0);
 
        /*
-        * NB: We return the fixed value provided as input. Although we could
-        * return a boolean since we either successfully read the WAL page or
-        * raise an error, but the caller expects this value to be returned. The
-        * routine that reads WAL pages from the physical WAL file follows the
-        * same convention.
+        * NB: We return the fixed value provided as input. We could return a
+        * boolean since we either successfully read the WAL page or raise an
+        * error, but the caller expects this value to be returned. The routine
+        * that reads WAL pages from the physical WAL file follows the same
+        * convention.
         */
        return count;
 }
@@ -368,8 +375,8 @@ read_archive_wal_page(XLogDumpPrivate *privateInfo, 
XLogRecPtr targetPagePtr,
 /*
  * Clears the buffer of a WAL entry that is being ignored.  This frees up 
memory
  * and prevents the accumulation of irrelevant WAL data.  Additionally,
- * conditionally setting cur_file within privateinfo to NULL ensures the
- * archive streamer skips unnecessary copy operations
+ * conditionally setting cur_file within privateInfo to NULL ensures the
+ * archive streamer skips unnecessary copy operations.
  */
 void
 free_archive_wal_entry(const char *fname, XLogDumpPrivate *privateInfo)
@@ -498,11 +505,18 @@ static int
 read_archive_file(XLogDumpPrivate *privateInfo, Size count)
 {
        int                     rc;
-       char       *buffer;
 
-       buffer = pg_malloc(count * sizeof(uint8));
+       /* Allocate or grow the reusable read buffer as needed */
+       if (privateInfo->archive_read_buf == NULL ||
+               privateInfo->archive_read_buf_size < count)
+       {
+               if (privateInfo->archive_read_buf != NULL)
+                       pg_free(privateInfo->archive_read_buf);
+               privateInfo->archive_read_buf = pg_malloc(count);
+               privateInfo->archive_read_buf_size = count;
+       }
 
-       rc = read(privateInfo->archive_fd, buffer, count);
+       rc = read(privateInfo->archive_fd, privateInfo->archive_read_buf, 
count);
        if (rc < 0)
                pg_fatal("could not read file \"%s\": %m",
                                 privateInfo->archive_name);
@@ -513,8 +527,8 @@ read_archive_file(XLogDumpPrivate *privateInfo, Size count)
         */
        if (rc > 0)
                astreamer_content(privateInfo->archive_streamer, NULL,
-                                                 buffer, rc, 
ASTREAMER_UNKNOWN);
-       pg_free(buffer);
+                                                 
privateInfo->archive_read_buf, rc,
+                                                 ASTREAMER_UNKNOWN);
 
        return rc;
 }
diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c
index 4b438b53ead..e970b007883 100644
--- a/src/bin/pg_waldump/pg_waldump.c
+++ b/src/bin/pg_waldump/pg_waldump.c
@@ -491,7 +491,7 @@ TarWALDumpReadPage(XLogReaderState *state, XLogRecPtr 
targetPagePtr, int reqLen,
        if (!XLByteInSeg(targetPagePtr, curSegNo, WalSegSz))
        {
                char            fname[MAXFNAMELEN];
-               XLogSegNo   nextSegNo;
+               XLogSegNo       nextSegNo;
 
                /*
                 * Calculate the next WAL segment to be decoded from the given 
page
@@ -509,9 +509,10 @@ TarWALDumpReadPage(XLogReaderState *state, XLogRecPtr 
targetPagePtr, int reqLen,
                }
 
                /*
-                * If in pre-reading mode (prior to actual decoding), do not 
delete any
-                * entries that might be requested again once the decoding loop 
starts.
-                * For more details, see the comments in 
read_archive_wal_page().
+                * If in pre-reading mode (prior to actual decoding), do not 
delete
+                * any entries that might be requested again once the decoding 
loop
+                * starts. For more details, see the comments in
+                * read_archive_wal_page().
                 */
                if (private->decoding_started && curSegNo < nextSegNo)
                {
@@ -908,7 +909,7 @@ main(int argc, char **argv)
        char       *waldir = NULL;
        char       *walpath = NULL;
        char       *errormsg;
-       pg_compress_algorithm compression;
+       pg_compress_algorithm compression = PG_COMPRESSION_NONE;
 
        static struct option long_options[] = {
                {"bkp-details", no_argument, NULL, 'b'},
@@ -1317,7 +1318,6 @@ main(int argc, char **argv)
                                segno = endsegno;
                        }
 
-
                        if (!XLByteInSeg(private.endptr, segno, 
private.segsize) &&
                                private.endptr != (segno + 1) * private.segsize)
                        {
diff --git a/src/bin/pg_waldump/pg_waldump.h b/src/bin/pg_waldump/pg_waldump.h
index 6c242b7fcbc..1097390d575 100644
--- a/src/bin/pg_waldump/pg_waldump.h
+++ b/src/bin/pg_waldump/pg_waldump.h
@@ -36,6 +36,8 @@ typedef struct XLogDumpPrivate
        int                     archive_fd;             /* File descriptor for 
the open tar file */
 
        astreamer  *archive_streamer;
+       char       *archive_read_buf;   /* Reusable read buffer for archive I/O 
*/
+       Size            archive_read_buf_size;
 
        /* What the archive streamer is currently reading */
        struct ArchivedWALFile *cur_file;
diff --git a/src/common/compression.c b/src/common/compression.c
index f117e21237f..fb27501d297 100644
--- a/src/common/compression.c
+++ b/src/common/compression.c
@@ -46,7 +46,7 @@ static bool expect_boolean_value(char *keyword, char *value,
  * sets *algorithm if the name is recognized. Otherwise returns false.
  */
 bool
-parse_tar_compress_algorithm(char *fname, pg_compress_algorithm *algorithm)
+parse_tar_compress_algorithm(const char *fname, pg_compress_algorithm 
*algorithm)
 {
        int                     fname_len = strlen(fname);
 
diff --git a/src/include/common/compression.h b/src/include/common/compression.h
index 50f21656b88..f99c747cdd3 100644
--- a/src/include/common/compression.h
+++ b/src/include/common/compression.h
@@ -41,7 +41,7 @@ typedef struct pg_compress_specification
 
 extern void parse_compress_options(const char *option, char **algorithm,
                                                                   char 
**detail);
-extern bool parse_tar_compress_algorithm(char *fname,
+extern bool parse_tar_compress_algorithm(const char *fname,
                                                                                
 pg_compress_algorithm *algorithm);
 extern bool parse_compress_algorithm(char *name, pg_compress_algorithm 
*algorithm);
 extern const char *get_compress_algorithm_name(pg_compress_algorithm 
algorithm);

Re: pg_waldump: support decoding of WAL inside tarfile

Reply via email to