On Wed, Aug 7, 2024 at 11:28 PM Robert Haas <robertmh...@gmail.com> wrote:
>
> On Wed, Aug 7, 2024 at 1:05 PM Amul Sul <sula...@gmail.com> wrote:
> > The main issue I have is computing the total_size of valid files that
> > will be checksummed and that exist in both the manifests and the
> > backup, in the case of a tar backup. This cannot be done in the same
> > way as with a plain backup.
>
> I think you should compute and sum the sizes of the tar files
> themselves. Suppose you readdir(), make a list of files that look
> relevant, and stat() each one. total_size is the sum of the file
> sizes. Then you work your way through the list of files and read each
> one. done_size is the total size of all files you've read completely
> plus the number of bytes you've read from the current file so far.
>

I tried this in the attached version and made a few additional changes
based on Sravan's off-list comments regarding function names and
descriptions.

Now, verification happens in two passes. The first pass simply
verifies the file names, determines their compression types, and
returns a list of valid tar files whose contents need to be verified
in the second pass. The second pass is called at the end of
verify_backup_directory() after all files in that directory have been
scanned. I named the functions for pass 1 and pass 2 as
verify_tar_file_name() and verify_tar_file_contents(), respectively.
The rest of the code flow is similar as in the previous version.

In the attached patch set, I abandoned the changes, touching the
progress reporting code of plain backups by dropping the previous 0009
patch. The new 0009 patch adds missing APIs to simple_list.c to
destroy SimplePtrList. The rest of the patch numbers remain unchanged.

Regards,
Amul
From 98ecaf7d965d44e4c5d1e558b1406230da31a79c Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Wed, 31 Jul 2024 16:22:07 +0530
Subject: [PATCH v9 11/12] pg_verifybackup: Read tar files and verify its
 contents

This patch implements TAR format backup verification.

For progress reporting support, we perform this verification in two
passes: the first pass calculates total_size, and the second pass
updates done_size as verification progresses.

For the verification, in the first pass, we call verify_tar_file(),
which performs basic verification by expecting only base.tar or
<tablespaceoid>.tar files and raises an error for any other files.  It
also determines the compression type of the archive file. All this
information is stored in a newly added tarFile struct, which is
appended to a list that will be used in the second pass (by
verify_tar_content()) for the final verification. In the second pass,
the tar archives are read, decompressed, and the required verification
is carried out.

For reading and decompression, fe_utils/astreamer.h is used. For
verification, a new archive streamer has been added in
astreamer_verify.c to handle TAR member files and their contents; see
astreamer_verify_content() for details. The stack of astreamers will
be set up for each TAR file in verify_tar_content(), depending on its
compression type which is detected in the first pass.

When information about a TAR member file (i.e., ASTREAMER_MEMBER_HEADER)
is received, we first verify its entry against the backup manifest. We
then decide if further checks are needed, such as checksum
verification and control data verification (if it is a pg_control
file), once the member file contents are received. Although this
decision could be made when the contents are received, it is more
efficient to make it earlier since the member file contents are
received in multiple iterations. In short, we process
ASTREAMER_MEMBER_CONTENTS multiple times but only once for other
ASTREAMER_MEMBER_* cases. We maintain this information in the
astreamer_verify structure for each member file, which is reset when
the file ends.
---
 src/bin/pg_verifybackup/Makefile           |   4 +-
 src/bin/pg_verifybackup/astreamer_verify.c | 354 +++++++++++++++++++++
 src/bin/pg_verifybackup/meson.build        |   6 +-
 src/bin/pg_verifybackup/pg_verifybackup.c  | 313 +++++++++++++++++-
 src/bin/pg_verifybackup/pg_verifybackup.h  |   6 +
 src/tools/pgindent/typedefs.list           |   2 +
 6 files changed, 675 insertions(+), 10 deletions(-)
 create mode 100644 src/bin/pg_verifybackup/astreamer_verify.c

diff --git a/src/bin/pg_verifybackup/Makefile b/src/bin/pg_verifybackup/Makefile
index 7c045f142e8..df7aaabd530 100644
--- a/src/bin/pg_verifybackup/Makefile
+++ b/src/bin/pg_verifybackup/Makefile
@@ -17,11 +17,13 @@ top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
 # We need libpq only because fe_utils does.
+override CPPFLAGS := -I$(libpq_srcdir) $(CPPFLAGS)
 LDFLAGS_INTERNAL += -L$(top_builddir)/src/fe_utils -lpgfeutils $(libpq_pgport)
 
 OBJS = \
 	$(WIN32RES) \
-	pg_verifybackup.o
+	pg_verifybackup.o \
+	astreamer_verify.o
 
 all: pg_verifybackup
 
diff --git a/src/bin/pg_verifybackup/astreamer_verify.c b/src/bin/pg_verifybackup/astreamer_verify.c
new file mode 100644
index 00000000000..0983dffde8e
--- /dev/null
+++ b/src/bin/pg_verifybackup/astreamer_verify.c
@@ -0,0 +1,354 @@
+/*-------------------------------------------------------------------------
+ *
+ * astreamer_verify.c
+ *
+ * Extend fe_utils/astreamer.h archive streaming facility to verify TAR
+ * backup.
+ *
+ * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
+ *
+ * src/bin/pg_verifybackup/astreamer_verify.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres_fe.h"
+
+#include "common/logging.h"
+#include "pg_verifybackup.h"
+
+typedef struct astreamer_verify
+{
+	astreamer	base;
+	verifier_context *context;
+	char	   *archive_name;
+	Oid			tblspc_oid;
+	pg_checksum_context *checksum_ctx;
+
+	/* Hold information for a member file verification */
+	manifest_file *mfile;
+	int64		received_bytes;
+	bool		verify_checksum;
+	bool		verify_control_data;
+} astreamer_verify;
+
+static void astreamer_verify_content(astreamer *streamer,
+									 astreamer_member *member,
+									 const char *data, int len,
+									 astreamer_archive_context context);
+static void astreamer_verify_finalize(astreamer *streamer);
+static void astreamer_verify_free(astreamer *streamer);
+
+static void verify_member_header(astreamer *streamer, astreamer_member *member);
+static void verify_member_checksum(astreamer *streamer,
+								   astreamer_member *member,
+								   const char *buffer, int buffer_len);
+static void verify_member_control_data(astreamer *streamer,
+									   astreamer_member *member,
+									   const char *data, int len);
+static void reset_member_info(astreamer *streamer);
+
+static const astreamer_ops astreamer_verify_ops = {
+	.content = astreamer_verify_content,
+	.finalize = astreamer_verify_finalize,
+	.free = astreamer_verify_free
+};
+
+/*
+ * Create a astreamer that can verifies content of a TAR file.
+ */
+astreamer *
+astreamer_verify_content_new(astreamer *next, verifier_context *context,
+							 char *archive_name, Oid tblspc_oid)
+{
+	astreamer_verify *streamer;
+
+	streamer = palloc0(sizeof(astreamer_verify));
+	*((const astreamer_ops **) &streamer->base.bbs_ops) =
+		&astreamer_verify_ops;
+
+	streamer->base.bbs_next = next;
+	streamer->context = context;
+	streamer->archive_name = archive_name;
+	streamer->tblspc_oid = tblspc_oid;
+	initStringInfo(&streamer->base.bbs_buffer);
+
+	if (!context->skip_checksums)
+		streamer->checksum_ctx = pg_malloc(sizeof(pg_checksum_context));
+
+	return &streamer->base;
+}
+
+/*
+ * The main entry point of the archive streamer for verifying tar members.
+ */
+static void
+astreamer_verify_content(astreamer *streamer, astreamer_member *member,
+						 const char *data, int len,
+						 astreamer_archive_context context)
+{
+	astreamer_verify *mystreamer = (astreamer_verify *) streamer;
+
+	Assert(context != ASTREAMER_UNKNOWN);
+
+	switch (context)
+	{
+		case ASTREAMER_MEMBER_HEADER:
+
+			/*
+			 * Perform the initial check and setup verification steps.
+			 */
+			verify_member_header(streamer, member);
+			break;
+
+		case ASTREAMER_MEMBER_CONTENTS:
+
+			/*
+			 * Process the member content according to the flags set by the
+			 * member header processing routine for checksum and control data
+			 * verification.
+			 */
+			if (mystreamer->verify_checksum)
+				verify_member_checksum(streamer, member, data, len);
+
+			if (mystreamer->verify_control_data)
+				verify_member_control_data(streamer, member, data, len);
+			break;
+
+		case ASTREAMER_MEMBER_TRAILER:
+
+			/*
+			 * Reset the temporary information stored for the verification.
+			 */
+			reset_member_info(streamer);
+			break;
+
+		case ASTREAMER_ARCHIVE_TRAILER:
+			break;
+
+		default:
+			/* Shouldn't happen. */
+			pg_fatal("unexpected state while parsing tar archive");
+	}
+}
+
+/*
+ * End-of-stream processing for a astreamer_verify stream.
+ */
+static void
+astreamer_verify_finalize(astreamer *streamer)
+{
+	Assert(streamer->bbs_next == NULL);
+}
+
+/*
+ * Free memory associated with a astreamer_verify stream.
+ */
+static void
+astreamer_verify_free(astreamer *streamer)
+{
+	astreamer_verify *mystreamer = (astreamer_verify *) streamer;
+
+	if (mystreamer->checksum_ctx)
+		pfree(mystreamer->checksum_ctx);
+
+	pfree(streamer->bbs_buffer.data);
+	pfree(streamer);
+}
+
+/*
+ * Verifies the tar member entry if it corresponds to a file in the backup
+ * manifest. If the archive being processed is a tablespace, prepares the
+ * required file path for subsequent operations. Finally, determines if
+ * checksum verification and control data verification need to be performed
+ * during file content processing
+ */
+static void
+verify_member_header(astreamer *streamer, astreamer_member *member)
+{
+	astreamer_verify *mystreamer = (astreamer_verify *) streamer;
+	manifest_file *m;
+
+	/* We are only interested in files that are not in the ignore list. */
+	if (member->is_directory || member->is_link ||
+		should_ignore_relpath(mystreamer->context, member->pathname))
+		return;
+
+	/*
+	 * The backup_manifest stores a relative path to the base directory for
+	 * files belong tablespace, whereas <tablespaceoid>.tar doesn't. Prepare
+	 * the required path, otherwise, the manfiest entry verification will
+	 * fail.
+	 */
+	if (OidIsValid(mystreamer->tblspc_oid))
+	{
+		char		temp[MAXPGPATH];
+
+		/* Copy original name at temporary space */
+		memcpy(temp, member->pathname, MAXPGPATH);
+
+		snprintf(member->pathname, MAXPGPATH, "%s/%d/%s",
+				 "pg_tblspc", mystreamer->tblspc_oid, temp);
+	}
+
+	/* Check the manifest entry */
+	m = verify_manifest_entry(mystreamer->context, member->pathname,
+							  member->size);
+	mystreamer->mfile = (void *) m;
+
+	/*
+	 * Prepare for checksum and control data verification.
+	 *
+	 * We could have these checks while receiving contents. However, since
+	 * contents are received in multiple iterations, this would result in
+	 * these lengthy checks being performed multiple times. Instead, having a
+	 * single flag would be more efficient.
+	 */
+	mystreamer->verify_checksum =
+		(!mystreamer->context->skip_checksums && should_verify_checksum(m));
+	mystreamer->verify_control_data =
+		should_verify_control_data(mystreamer->context->manifest, m);
+
+	/* Initialize the context required for checksum verification. */
+	if (mystreamer->verify_checksum &&
+		pg_checksum_init(mystreamer->checksum_ctx, m->checksum_type) < 0)
+	{
+		report_backup_error(mystreamer->context,
+							"%s: could not initialize checksum of file \"%s\"",
+							mystreamer->archive_name, m->pathname);
+
+		/*
+		 * Checksum verification cannot be performed without proper context
+		 * initialization.
+		 */
+		mystreamer->verify_checksum = false;
+	}
+}
+
+/*
+ * Computes the checksum incrementally for the received file content, and
+ * finally calls the routine for checksum verification, similar to
+ * verify_file_checksum().
+ *
+ * The caller should pass a correctly initialized checksum_ctx, which will be
+ * used for incremental checksum computation. Once the complete file content is
+ * received (tracked using received_bytes), the final checksum verification
+ * happens.
+ */
+static void
+verify_member_checksum(astreamer *streamer, astreamer_member *member,
+					   const char *buffer, int buffer_len)
+{
+	astreamer_verify *mystreamer = (astreamer_verify *) streamer;
+	pg_checksum_context *checksum_ctx = mystreamer->checksum_ctx;
+	verifier_context *context = mystreamer->context;
+	manifest_file *m = mystreamer->mfile;
+	const char *relpath = m->pathname;
+	uint8		checksumbuf[PG_CHECKSUM_MAX_LENGTH];
+
+	/*
+	 * Mark it false to avoid unexpected re-entrance for the same file content
+	 * (e.g. returned in error should not be revisited).
+	 */
+	Assert(mystreamer->verify_checksum);
+	mystreamer->verify_checksum = false;
+
+	/* Should have came for the right file */
+	Assert(strcmp(member->pathname, relpath) == 0);
+
+	/*
+	 * The checksum context should match the type noted in the backup
+	 * manifest.
+	 */
+	Assert(checksum_ctx->type == m->checksum_type);
+
+	/* Update the total count of computed checksum bytes. */
+	mystreamer->received_bytes += buffer_len;
+
+	if (pg_checksum_update(checksum_ctx, (uint8 *) buffer, buffer_len) < 0)
+	{
+		report_backup_error(context, "could not update checksum of file \"%s\"",
+							relpath);
+		return;
+	}
+
+	/* Yet to receive the full content of the file. */
+	if (mystreamer->received_bytes < m->size)
+	{
+		mystreamer->verify_checksum = true;
+		return;
+	}
+
+	/* Do the final computation and verification. */
+	verify_checksum(context, m, checksum_ctx, checksumbuf);
+}
+
+/*
+ * Prepare the control data from the received tar member contents, which are
+ * supposed to be from the pg_control file, including CRC calculation. Then,
+ * call the routines that perform the final verification of the control file
+ * information.
+ */
+static void
+verify_member_control_data(astreamer *streamer, astreamer_member *member,
+						   const char *data, int len)
+{
+	astreamer_verify *mystreamer = (astreamer_verify *) streamer;
+	manifest_data *manifest = mystreamer->context->manifest;
+	ControlFileData control_file;
+	pg_crc32c	crc;
+	bool		crc_ok;
+
+	/* Should be here only for control file */
+	Assert(strcmp(member->pathname, "global/pg_control") == 0);
+	Assert(manifest->version != 1);
+
+	/* Mark it as false to avoid unexpected re-entrance */
+	Assert(mystreamer->verify_control_data);
+	mystreamer->verify_control_data = false;
+
+	/* Should have whole control file data. */
+	if (!astreamer_buffer_until(streamer, &data, &len, sizeof(ControlFileData)))
+	{
+		mystreamer->verify_control_data = true;
+		return;
+	}
+
+	pg_log_debug("%s: reading \"%s\"", mystreamer->archive_name,
+				 member->pathname);
+
+	if (streamer->bbs_buffer.len != sizeof(ControlFileData))
+		report_fatal_error("%s: could not read control file: read %d of %zu",
+						   mystreamer->archive_name, streamer->bbs_buffer.len,
+						   sizeof(ControlFileData));
+
+	memcpy(&control_file, streamer->bbs_buffer.data,
+		   sizeof(ControlFileData));
+
+	/* Check the CRC. */
+	INIT_CRC32C(crc);
+	COMP_CRC32C(crc,
+				(char *) (&control_file),
+				offsetof(ControlFileData, crc));
+	FIN_CRC32C(crc);
+
+	crc_ok = EQ_CRC32C(crc, control_file.crc);
+
+	/* Do the final control data verification. */
+	verify_control_data(&control_file, member->pathname, crc_ok,
+						manifest->system_identifier);
+}
+
+/*
+ * Reset flags and free memory allocations for member file verification.
+ */
+static void
+reset_member_info(astreamer *streamer)
+{
+	astreamer_verify *mystreamer = (astreamer_verify *) streamer;
+
+	mystreamer->mfile = NULL;
+	mystreamer->received_bytes = 0;
+	mystreamer->verify_checksum = false;
+	mystreamer->verify_control_data = false;
+}
diff --git a/src/bin/pg_verifybackup/meson.build b/src/bin/pg_verifybackup/meson.build
index 7c7d31a0350..1e3fcf7ee5a 100644
--- a/src/bin/pg_verifybackup/meson.build
+++ b/src/bin/pg_verifybackup/meson.build
@@ -1,7 +1,8 @@
 # Copyright (c) 2022-2024, PostgreSQL Global Development Group
 
 pg_verifybackup_sources = files(
-  'pg_verifybackup.c'
+  'pg_verifybackup.c',
+  'astreamer_verify.c'
 )
 
 if host_system == 'windows'
@@ -10,9 +11,10 @@ if host_system == 'windows'
     '--FILEDESC', 'pg_verifybackup - verify a backup against using a backup manifest'])
 endif
 
+pg_verifybackup_deps = [frontend_code, libpq, lz4, zlib, zstd]
 pg_verifybackup = executable('pg_verifybackup',
   pg_verifybackup_sources,
-  dependencies: [frontend_code, libpq],
+  dependencies: pg_verifybackup_deps,
   kwargs: default_bin_args,
 )
 bin_targets += pg_verifybackup
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index 88196cca4e0..42b776eda18 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -20,9 +20,11 @@
 
 #include "common/compression.h"
 #include "common/parse_manifest.h"
+#include "common/relpath.h"
 #include "fe_utils/simple_list.h"
 #include "getopt_long.h"
 #include "pg_verifybackup.h"
+#include "pgtar.h"
 #include "pgtime.h"
 
 /*
@@ -39,6 +41,16 @@
  */
 #define ESTIMATED_BYTES_PER_MANIFEST_LINE	100
 
+/*
+ * Tar archive information needed for content verification.
+ */
+typedef struct tarFile
+{
+	char	   *relpath;
+	Oid			tblspc_oid;
+	pg_compress_algorithm compress_algorithm;
+} tarFile;
+
 static manifest_data *parse_manifest_file(char *manifest_path);
 static void verifybackup_version_cb(JsonManifestParseContext *context,
 									int manifest_version);
@@ -61,7 +73,18 @@ static char find_backup_format(verifier_context *context);
 static void verify_backup_directory(verifier_context *context,
 									char *relpath, char *fullpath);
 static void verify_backup_file(verifier_context *context,
-							   char *relpath, char *fullpath);
+							   char *relpath, char *fullpath,
+							   SimplePtrList *tarFiles);
+static void verify_plain_file(verifier_context *context,
+							  char *relpath, char *fullpath,
+							  size_t filesize);
+static void verify_tar_file_name(verifier_context *context, char *relpath,
+								 char *fullpath, int64 filesize,
+								 SimplePtrList *tarFiles);
+static void verify_tar_file_contents(verifier_context *context,
+									 SimplePtrList *tarFiles);
+static void verify_tar_contents(verifier_context *context, char *relpath,
+								char *fullpath, astreamer *streamer);
 static void report_extra_backup_files(verifier_context *context);
 static void verify_backup_checksums(verifier_context *context);
 static void verify_file_checksum(verifier_context *context,
@@ -70,6 +93,10 @@ static void verify_file_checksum(verifier_context *context,
 static void parse_required_wal(verifier_context *context,
 							   char *pg_waldump_path,
 							   char *wal_directory);
+static astreamer *create_archive_verifier(verifier_context *context,
+										  char *archive_name,
+										  Oid tblspc_oid,
+										  pg_compress_algorithm compress_algo);
 
 static void progress_report(bool finished);
 static void usage(void);
@@ -148,6 +175,10 @@ main(int argc, char **argv)
 	 */
 	simple_string_list_append(&context.ignore_list, "backup_manifest");
 	simple_string_list_append(&context.ignore_list, "pg_wal");
+	simple_string_list_append(&context.ignore_list, "pg_wal.tar");
+	simple_string_list_append(&context.ignore_list, "pg_wal.tar.gz");
+	simple_string_list_append(&context.ignore_list, "pg_wal.tar.lz4");
+	simple_string_list_append(&context.ignore_list, "pg_wal.tar.zst");
 	simple_string_list_append(&context.ignore_list, "postgresql.auto.conf");
 	simple_string_list_append(&context.ignore_list, "recovery.signal");
 	simple_string_list_append(&context.ignore_list, "standby.signal");
@@ -556,6 +587,7 @@ verify_backup_directory(verifier_context *context, char *relpath,
 {
 	DIR		   *dir;
 	struct dirent *dirent;
+	SimplePtrList tarFiles = {NULL, NULL};
 
 	dir = opendir(fullpath);
 	if (dir == NULL)
@@ -595,12 +627,17 @@ verify_backup_directory(verifier_context *context, char *relpath,
 			newrelpath = psprintf("%s/%s", relpath, filename);
 
 		if (!should_ignore_relpath(context, newrelpath))
-			verify_backup_file(context, newrelpath, newfullpath);
+			verify_backup_file(context, newrelpath, newfullpath, &tarFiles);
 
 		pfree(newfullpath);
 		pfree(newrelpath);
 	}
 
+	/* Perform the final verification of the tar contents, if any. */
+	Assert(tarFiles.head == NULL || context->format == 't');
+	if (tarFiles.head != NULL)
+		verify_tar_file_contents(context, &tarFiles);
+
 	if (closedir(dir))
 	{
 		report_backup_error(context,
@@ -610,16 +647,18 @@ verify_backup_directory(verifier_context *context, char *relpath,
 }
 
 /*
- * Verify one file (which might actually be a directory or a symlink).
+ * Verify one file (which might actually be a directory, a symlink or a
+ * archive).
  *
  * The arguments to this function have the same meaning as the arguments to
- * verify_backup_directory.
+ * verify_backup_directory. The additional argument outputs the list of tar
+ * archive information if any.
  */
 static void
-verify_backup_file(verifier_context *context, char *relpath, char *fullpath)
+verify_backup_file(verifier_context *context, char *relpath, char *fullpath,
+				   SimplePtrList *tarFiles)
 {
 	struct stat sb;
-	manifest_file *m;
 
 	if (stat(fullpath, &sb) != 0)
 	{
@@ -652,8 +691,36 @@ verify_backup_file(verifier_context *context, char *relpath, char *fullpath)
 		return;
 	}
 
+	/* Do the further verifications */
+	if (context->format == 'p')
+		verify_plain_file(context, relpath, fullpath, sb.st_size);
+	else
+	{
+		/*
+		 * This is preparatory work for the tar format backup verification,
+		 * where we verify only the archive file name and its compression
+		 * type. The final verification will be carried out after listing all
+		 * the archives from the backup directory.
+		 */
+		verify_tar_file_name(context, relpath, fullpath, sb.st_size, tarFiles);
+	}
+}
+
+/*
+ * Verify one plain file or a symlink.
+ *
+ * The arguments to this function are mostly the same as the
+ * verify_backup_directory. The additional argument is the file size for
+ * verifying against manifest entry.
+ */
+static void
+verify_plain_file(verifier_context *context, char *relpath, char *fullpath,
+				  size_t filesize)
+{
+	manifest_file *m;
+
 	/* Check the backup manifest entry for this file. */
-	m = verify_manifest_entry(context, relpath, sb.st_size);
+	m = verify_manifest_entry(context, relpath, filesize);
 
 	/* Validate the manifest system identifier */
 	if (should_verify_control_data(context->manifest, m))
@@ -676,6 +743,202 @@ verify_backup_file(verifier_context *context, char *relpath, char *fullpath)
 		total_size += m->size;
 }
 
+/*
+ * Verify one tar archive file.
+ *
+ * This does not perform a complete verification; it only performs basic
+ * validation of the tar format backup file, detects the compression type, and
+ * appends that information to the tarFiles list. An error will be reported if
+ * the tar archive name or compression type is not as expected.
+ *
+ * The arguments to this function are mostly the same as the
+ * verify_backup_file. The additional argument is the file size for progress
+ * report.
+ */
+static void
+verify_tar_file_name(verifier_context *context, char *relpath, char *fullpath,
+					 int64 filesize, SimplePtrList *tarFiles)
+{
+	Oid			tblspc_oid = InvalidOid;
+	int			file_name_len;
+	int			file_extn_len;
+	pg_compress_algorithm compress_algorithm;
+	tarFile    *tar_file;
+
+	/* Should be tar format backup */
+	Assert(context->format == 't');
+
+	/* Find the compression type of the tar file */
+	if (strstr(relpath, ".tgz") != NULL)
+	{
+		compress_algorithm = PG_COMPRESSION_GZIP;
+		file_extn_len = 4;		/* length of ".tgz" */
+	}
+	else if (strstr(relpath, ".tar.gz") != NULL)
+	{
+		compress_algorithm = PG_COMPRESSION_GZIP;
+		file_extn_len = 7;		/* length of ".tar.gz" */
+	}
+	else if (strstr(relpath, ".tar.lz4") != NULL)
+	{
+		compress_algorithm = PG_COMPRESSION_LZ4;
+		file_extn_len = 8;		/* length of ".tar.lz4" */
+	}
+	else if (strstr(relpath, ".tar.zst") != NULL)
+	{
+		compress_algorithm = PG_COMPRESSION_ZSTD;
+		file_extn_len = 8;		/* length of ".tar.zst" */
+	}
+	else if (strstr(relpath, ".tar") != NULL)
+	{
+		compress_algorithm = PG_COMPRESSION_NONE;
+		file_extn_len = 4;		/* length of ".tar" */
+	}
+	else
+	{
+		report_backup_error(context,
+							"\"%s\" unexpected file in the tar format backup",
+							relpath);
+		return;
+	}
+
+	/*
+	 * We expect tar archive files of backing up the main directory and
+	 * tablespace.
+	 *
+	 * pg_basebackup writes the main data directory to an archive file named
+	 * base.tar and the tablespace directory to <tablespaceoid>.tar, followed
+	 * by a compression type extension such as .gz, .lz4, or .zst.
+	 */
+	file_name_len = strlen(relpath);
+	if (strspn(relpath, "0123456789") == (file_name_len - file_extn_len))
+	{
+		/*
+		 * Since the file matches the <tablespaceoid>.tar format, extract the
+		 * tablespaceoid, which is needed to prepare the paths of the files
+		 * belonging to that tablespace relative to the base directory.
+		 */
+		tblspc_oid = strtoi64(relpath, NULL, 10);
+	}
+	/* Otherwise, it should be a base.tar file; if not, raise an error. */
+	else if (strncmp("base", relpath, file_name_len - file_extn_len) != 0)
+	{
+		report_backup_error(context,
+							"\"%s\" unexpected file in the tar format backup",
+							relpath);
+		return;
+	}
+
+	/*
+	 * Append the information to the list for complete verification at a later
+	 * stage.
+	 */
+	tar_file = pg_malloc(sizeof(tarFile));
+	tar_file->relpath = pstrdup(relpath);
+	tar_file->tblspc_oid = tblspc_oid;
+	tar_file->compress_algorithm = compress_algorithm;
+
+	simple_ptr_list_append(tarFiles, tar_file);
+
+	/* Update statistics for progress report, if necessary */
+	if (show_progress)
+		total_size += filesize;
+}
+
+/*
+ * This is the final part of tar file verification, which prepares the archive
+ * streamer stack according to the tar file compression format for each tar
+ * archive and invokes them for reading, decompressing, and ultimately
+ * verifying the contents.
+ *
+ * The arguments to this function should be a list of valid tar archives to
+ * verify, and the allocation will be freed once the verification is complete.
+ */
+static void
+verify_tar_file_contents(verifier_context *context, SimplePtrList *tarFiles)
+{
+	SimplePtrListCell *cell;
+
+	progress_report(false);
+
+	for (cell = tarFiles->head; cell != NULL; cell = cell->next)
+	{
+		tarFile    *tar_file = (tarFile *) cell->ptr;
+		astreamer  *streamer;
+		char	   *fullpath;
+
+		/* Prepare archive streamer stack */
+		streamer = create_archive_verifier(context,
+										   tar_file->relpath,
+										   tar_file->tblspc_oid,
+										   tar_file->compress_algorithm);
+
+		/* Compute the full pathname to the target file. */
+		fullpath = psprintf("%s/%s", context->backup_directory,
+							tar_file->relpath);
+
+		/* Invoke the streamer for reading, decompressing, and verifying. */
+		verify_tar_contents(context, tar_file->relpath, fullpath, streamer);
+
+		/* Cleanup. */
+		pfree(tar_file->relpath);
+		pfree(tar_file);
+		pfree(fullpath);
+
+		astreamer_finalize(streamer);
+		astreamer_free(streamer);
+	}
+	simple_ptr_list_destroy(tarFiles);
+
+	progress_report(true);
+}
+
+/*
+ * Performs the actual work for tar content verification. It reads a given tar
+ * file in predefined chunks and passes it to the streamer, which initiates
+ * routines for decompression (if necessary) and then verifies each member
+ * within the tar archive.
+ */
+static void
+verify_tar_contents(verifier_context *context, char *relpath, char *fullpath,
+					astreamer *streamer)
+{
+	int			fd;
+	int			rc;
+	char	   *buffer;
+
+	pg_log_debug("reading \"%s\"", fullpath);
+
+	/* Open the target file. */
+	if ((fd = open(fullpath, O_RDONLY | PG_BINARY, 0)) < 0)
+	{
+		report_backup_error(context, "could not open file \"%s\": %m",
+							relpath);
+		return;
+	}
+
+	buffer = pg_malloc(READ_CHUNK_SIZE * sizeof(uint8));
+
+	/* Perform the reads */
+	while ((rc = read(fd, buffer, READ_CHUNK_SIZE)) > 0)
+	{
+		astreamer_content(streamer, NULL, buffer, rc, ASTREAMER_UNKNOWN);
+
+		/* Report progress */
+		done_size += rc;
+		progress_report(false);
+	}
+
+	if (rc < 0)
+		report_backup_error(context, "could not read file \"%s\": %m",
+							relpath);
+
+	/* Close the file. */
+	if (close(fd) != 0)
+		report_backup_error(context, "could not close file \"%s\": %m",
+							relpath);
+}
+
 /*
  * Verify file and its size entry in the manifest.
  */
@@ -1044,6 +1307,42 @@ find_backup_format(verifier_context *context)
 	return result;
 }
 
+/*
+ * Identifies the necessary steps for verifying the contents of the
+ * provided tar file.
+ */
+static astreamer *
+create_archive_verifier(verifier_context *context, char *archive_name,
+						Oid tblspc_oid, pg_compress_algorithm compress_algo)
+{
+	astreamer  *streamer = NULL;
+
+	/* Should be here only for tar backup */
+	Assert(context->format == 't');
+
+	/*
+	 * To verify the contents of the tar file, the initial step is to parse
+	 * its content.
+	 */
+	streamer = astreamer_verify_content_new(streamer, context, archive_name,
+											tblspc_oid);
+	streamer = astreamer_tar_parser_new(streamer);
+
+	/*
+	 * If the tar file is compressed, we must perform the appropriate
+	 * decompression operation before proceeding with the verification of its
+	 * contents.
+	 */
+	if (compress_algo == PG_COMPRESSION_GZIP)
+		streamer = astreamer_gzip_decompressor_new(streamer);
+	else if (compress_algo == PG_COMPRESSION_LZ4)
+		streamer = astreamer_lz4_decompressor_new(streamer);
+	else if (compress_algo == PG_COMPRESSION_ZSTD)
+		streamer = astreamer_zstd_decompressor_new(streamer);
+
+	return streamer;
+}
+
 /*
  * Print a progress report based on the global variables.
  *
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.h b/src/bin/pg_verifybackup/pg_verifybackup.h
index 856e8947c1d..db847a59657 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.h
+++ b/src/bin/pg_verifybackup/pg_verifybackup.h
@@ -18,6 +18,7 @@
 #include "common/hashfn_unstable.h"
 #include "common/logging.h"
 #include "common/parse_manifest.h"
+#include "fe_utils/astreamer.h"
 #include "fe_utils/simple_list.h"
 
 /*
@@ -128,4 +129,9 @@ extern void report_fatal_error(const char *pg_restrict fmt,...)
 extern bool should_ignore_relpath(verifier_context *context,
 								  const char *relpath);
 
+extern astreamer *astreamer_verify_content_new(astreamer *next,
+											   verifier_context *context,
+											   char *archive_name,
+											   Oid tblspc_oid);
+
 #endif							/* PG_VERIFYBACKUP_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 547d14b3e7c..47b5f0edcc7 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -3329,6 +3329,7 @@ astreamer_plain_writer
 astreamer_recovery_injector
 astreamer_tar_archiver
 astreamer_tar_parser
+astreamer_verify
 astreamer_zstd_frame
 bgworker_main_type
 bh_node_type
@@ -3950,6 +3951,7 @@ substitute_phv_relids_context
 subxids_array_status
 symbol
 tablespaceinfo
+tarFile
 td_entry
 teSection
 temp_tablespaces_extra
-- 
2.18.0

From 3ba9b317cc313be4c3927b59d1d967458f5bcb66 Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Thu, 1 Aug 2024 15:47:26 +0530
Subject: [PATCH v9 08/12] Refactor: split verify_control_file.

Separated the manifest entry verification code into a new function and
introduced the should_verify_control_data() macro, similar to
should_verify_checksum().

Like verify_file_checksum(), verify_control_file() is too design to
accept the pg_control file patch which will be opened and respective
information will be verified. But, in case of tar backup we would be
having pg_control file containt instead, that needs to be verified in
the same way.  For that reason the code that doing the verification is
separated into separate function to so that can be reused for the tar
backup verification as well.
---
 src/bin/pg_verifybackup/pg_verifybackup.c | 44 +++++++++++------------
 src/bin/pg_verifybackup/pg_verifybackup.h | 15 ++++++++
 2 files changed, 35 insertions(+), 24 deletions(-)

diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index e4f499fcd37..5adf24e8f90 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -18,7 +18,6 @@
 #include <sys/stat.h>
 #include <time.h>
 
-#include "common/logging.h"
 #include "common/parse_manifest.h"
 #include "fe_utils/simple_list.h"
 #include "getopt_long.h"
@@ -61,8 +60,6 @@ static void verify_backup_directory(verifier_context *context,
 									char *relpath, char *fullpath);
 static void verify_backup_file(verifier_context *context,
 							   char *relpath, char *fullpath);
-static void verify_control_file(const char *controlpath,
-								uint64 manifest_system_identifier);
 static void report_extra_backup_files(verifier_context *context);
 static void verify_backup_checksums(verifier_context *context);
 static void verify_file_checksum(verifier_context *context,
@@ -625,14 +622,20 @@ verify_backup_file(verifier_context *context, char *relpath, char *fullpath)
 	/* Check the backup manifest entry for this file. */
 	m = verify_manifest_entry(context, relpath, sb.st_size);
 
-	/*
-	 * Validate the manifest system identifier, not available in manifest
-	 * version 1.
-	 */
-	if (context->manifest->version != 1 &&
-		strcmp(relpath, "global/pg_control") == 0 &&
-		m != NULL && m->matched && !m->bad)
-		verify_control_file(fullpath, context->manifest->system_identifier);
+	/* Validate the manifest system identifier */
+	if (should_verify_control_data(context->manifest, m))
+	{
+		ControlFileData *control_file;
+		bool		crc_ok;
+
+		pg_log_debug("reading \"%s\"", fullpath);
+		control_file = get_controlfile_by_exact_path(fullpath, &crc_ok);
+
+		verify_control_data(control_file, fullpath, crc_ok,
+							context->manifest->system_identifier);
+		/* Release memory. */
+		pfree(control_file);
+	}
 
 	/* Update statistics for progress report, if necessary */
 	if (show_progress && !context->skip_checksums &&
@@ -681,18 +684,14 @@ verify_manifest_entry(verifier_context *context, char *relpath, int64 filesize)
 }
 
 /*
- * Sanity check control file and validate system identifier against manifest
- * system identifier.
+ * Sanity check control file data and validate system identifier against
+ * manifest system identifier.
  */
-static void
-verify_control_file(const char *controlpath, uint64 manifest_system_identifier)
+void
+verify_control_data(ControlFileData *control_file,
+					const char *controlpath, bool crc_ok,
+					uint64 manifest_system_identifier)
 {
-	ControlFileData *control_file;
-	bool		crc_ok;
-
-	pg_log_debug("reading \"%s\"", controlpath);
-	control_file = get_controlfile_by_exact_path(controlpath, &crc_ok);
-
 	/* Control file contents not meaningful if CRC is bad. */
 	if (!crc_ok)
 		report_fatal_error("%s: CRC is incorrect", controlpath);
@@ -708,9 +707,6 @@ verify_control_file(const char *controlpath, uint64 manifest_system_identifier)
 						   controlpath,
 						   (unsigned long long) manifest_system_identifier,
 						   (unsigned long long) control_file->system_identifier);
-
-	/* Release memory. */
-	pfree(control_file);
 }
 
 /*
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.h b/src/bin/pg_verifybackup/pg_verifybackup.h
index 12812cf5584..42d01c26466 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.h
+++ b/src/bin/pg_verifybackup/pg_verifybackup.h
@@ -16,6 +16,7 @@
 
 #include "common/controldata_utils.h"
 #include "common/hashfn_unstable.h"
+#include "common/logging.h"
 #include "common/parse_manifest.h"
 #include "fe_utils/simple_list.h"
 
@@ -44,6 +45,17 @@ typedef struct manifest_file
 	(((m) != NULL) && ((m)->matched) && !((m)->bad) && \
 	 (((m)->checksum_type) != CHECKSUM_TYPE_NONE))
 
+/*
+ * Validate the manifest system identifier against the control file; this
+ * feature is not available in manifest version 1.  This validation should be
+ * carried out only if the manifest entry validation is completed without any
+ * errors.
+ */
+#define should_verify_control_data(manifest, m) \
+	(((manifest)->version != 1) && \
+	 ((m) != NULL) && ((m)->matched) && !((m)->bad) && \
+	 (strcmp((m)->pathname, "global/pg_control") == 0))
+
 /*
  * Define a hash table which we can use to store information about the files
  * mentioned in the backup manifest.
@@ -103,6 +115,9 @@ extern manifest_file *verify_manifest_entry(verifier_context *context,
 extern void verify_checksum(verifier_context *context, manifest_file *m,
 							pg_checksum_context *checksum_ctx,
 							uint8 *checksumbuf);
+extern void verify_control_data(ControlFileData *control_file,
+								const char *controlpath, bool crc_ok,
+								uint64 manifest_system_identifier);
 
 extern void report_backup_error(verifier_context *context,
 								const char *pg_restrict fmt,...)
-- 
2.18.0

From 6004858acf1740710f529d526ef70c96dee0bf9a Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Wed, 7 Aug 2024 18:15:29 +0530
Subject: [PATCH v9 12/12] pg_verifybackup: Tests and document

----
NOTE: This patch is not meant to be committed separately. It should
be squashed with the previous patch that implements tar format support.
----
---
 doc/src/sgml/ref/pg_verifybackup.sgml         | 42 ++++++++++-
 src/bin/pg_verifybackup/t/001_basic.pl        |  6 +-
 src/bin/pg_verifybackup/t/004_options.pl      |  2 +-
 src/bin/pg_verifybackup/t/005_bad_manifest.pl |  2 +-
 src/bin/pg_verifybackup/t/008_untar.pl        | 73 ++++++-------------
 src/bin/pg_verifybackup/t/010_client_untar.pl | 48 +-----------
 6 files changed, 72 insertions(+), 101 deletions(-)

diff --git a/doc/src/sgml/ref/pg_verifybackup.sgml b/doc/src/sgml/ref/pg_verifybackup.sgml
index a3f167f9f6e..60f771c7663 100644
--- a/doc/src/sgml/ref/pg_verifybackup.sgml
+++ b/doc/src/sgml/ref/pg_verifybackup.sgml
@@ -34,8 +34,10 @@ PostgreSQL documentation
    integrity of a database cluster backup taken using
    <command>pg_basebackup</command> against a
    <literal>backup_manifest</literal> generated by the server at the time
-   of the backup.  The backup must be stored in the "plain"
-   format; a "tar" format backup can be checked after extracting it.
+   of the backup.  The backup must be stored in the "plain" or "tar"
+   format.  Verification is supported for <literal>gzip</literal>,
+   <literal>lz4</literal>, and  <literal>zstd</literal> compressed tar backup;
+   any other compressed format backups can be checked after decompressing them.
   </para>
 
   <para>
@@ -168,6 +170,42 @@ PostgreSQL documentation
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>-F <replaceable class="parameter">format</replaceable></option></term>
+      <term><option>--format=<replaceable class="parameter">format</replaceable></option></term>
+      <listitem>
+       <para>
+        Specifies the format of the backup. <replaceable>format</replaceable>
+        can be one of the following:
+
+        <variablelist>
+         <varlistentry>
+          <term><literal>p</literal></term>
+          <term><literal>plain</literal></term>
+          <listitem>
+           <para>
+            Backup consists of plain files with the same layout as the
+            source server's data directory and tablespaces.
+           </para>
+          </listitem>
+         </varlistentry>
+
+         <varlistentry>
+          <term><literal>t</literal></term>
+          <term><literal>tar</literal></term>
+          <listitem>
+           <para>
+            Backup consists of tar files. The main data directory's contents
+            will be written to a file named <filename>base.tar</filename>,
+            and each other tablespace will be written to a separate tar file
+            named after that tablespace's OID.
+           </para>
+           </listitem>
+         </varlistentry>
+        </variablelist></para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>-n</option></term>
       <term><option>--no-parse-wal</option></term>
diff --git a/src/bin/pg_verifybackup/t/001_basic.pl b/src/bin/pg_verifybackup/t/001_basic.pl
index 2f3e52d296f..ca5b0402b7d 100644
--- a/src/bin/pg_verifybackup/t/001_basic.pl
+++ b/src/bin/pg_verifybackup/t/001_basic.pl
@@ -17,11 +17,11 @@ command_fails_like(
 	qr/no backup directory specified/,
 	'target directory must be specified');
 command_fails_like(
-	[ 'pg_verifybackup', $tempdir ],
+	[ 'pg_verifybackup', '-Fp', $tempdir ],
 	qr/could not open file.*\/backup_manifest\"/,
 	'pg_verifybackup requires a manifest');
 command_fails_like(
-	[ 'pg_verifybackup', $tempdir, $tempdir ],
+	[ 'pg_verifybackup', '-Fp', $tempdir, $tempdir ],
 	qr/too many command-line arguments/,
 	'multiple target directories not allowed');
 
@@ -31,7 +31,7 @@ close($fh);
 
 # but then try to use an alternate, nonexisting manifest
 command_fails_like(
-	[ 'pg_verifybackup', '-m', "$tempdir/not_the_manifest", $tempdir ],
+	[ 'pg_verifybackup', '-Fp', '-m', "$tempdir/not_the_manifest", $tempdir ],
 	qr/could not open file.*\/not_the_manifest\"/,
 	'pg_verifybackup respects -m flag');
 
diff --git a/src/bin/pg_verifybackup/t/004_options.pl b/src/bin/pg_verifybackup/t/004_options.pl
index 8ed2214408e..2f197648740 100644
--- a/src/bin/pg_verifybackup/t/004_options.pl
+++ b/src/bin/pg_verifybackup/t/004_options.pl
@@ -108,7 +108,7 @@ unlike(
 # Test valid manifest with nonexistent backup directory.
 command_fails_like(
 	[
-		'pg_verifybackup', '-m',
+		'pg_verifybackup', '-Fp', '-m',
 		"$backup_path/backup_manifest", "$backup_path/fake"
 	],
 	qr/could not open directory/,
diff --git a/src/bin/pg_verifybackup/t/005_bad_manifest.pl b/src/bin/pg_verifybackup/t/005_bad_manifest.pl
index c4ed64b62d5..28c51b6feb0 100644
--- a/src/bin/pg_verifybackup/t/005_bad_manifest.pl
+++ b/src/bin/pg_verifybackup/t/005_bad_manifest.pl
@@ -208,7 +208,7 @@ sub test_bad_manifest
 	print $fh $manifest_contents;
 	close($fh);
 
-	command_fails_like([ 'pg_verifybackup', $tempdir ], $regexp, $test_name);
+	command_fails_like([ 'pg_verifybackup', '-Fp', $tempdir ], $regexp, $test_name);
 	return;
 }
 
diff --git a/src/bin/pg_verifybackup/t/008_untar.pl b/src/bin/pg_verifybackup/t/008_untar.pl
index 7a09f3b75b2..9896560adc3 100644
--- a/src/bin/pg_verifybackup/t/008_untar.pl
+++ b/src/bin/pg_verifybackup/t/008_untar.pl
@@ -16,6 +16,20 @@ my $primary = PostgreSQL::Test::Cluster->new('primary');
 $primary->init(allows_streaming => 1);
 $primary->start;
 
+# Create a couple of directories to use as tablespaces.
+my $TS1_LOCATION = $primary->backup_dir .'/ts1';
+$TS1_LOCATION =~ s/\/\.\//\//g;    # collapse foo/./bar to foo/bar
+mkdir($TS1_LOCATION);
+
+# Create a tablespace with table in it.
+$primary->safe_psql('postgres', qq(
+		CREATE TABLESPACE regress_ts1 LOCATION '$TS1_LOCATION';
+		SELECT oid FROM pg_tablespace WHERE spcname = 'regress_ts1';
+		CREATE TABLE regress_tbl1(i int) TABLESPACE regress_ts1;
+		INSERT INTO regress_tbl1 VALUES(generate_series(1,5));));
+my $tsoid = $primary->safe_psql('postgres', qq(
+		SELECT oid FROM pg_tablespace WHERE spcname = 'regress_ts1'));
+
 my $backup_path = $primary->backup_dir . '/server-backup';
 my $extract_path = $primary->backup_dir . '/extracted-backup';
 
@@ -23,39 +37,31 @@ my @test_configuration = (
 	{
 		'compression_method' => 'none',
 		'backup_flags' => [],
-		'backup_archive' => 'base.tar',
+		'backup_archive' => ['base.tar', "$tsoid.tar"],
 		'enabled' => 1
 	},
 	{
 		'compression_method' => 'gzip',
 		'backup_flags' => [ '--compress', 'server-gzip' ],
-		'backup_archive' => 'base.tar.gz',
-		'decompress_program' => $ENV{'GZIP_PROGRAM'},
-		'decompress_flags' => ['-d'],
+		'backup_archive' => [ 'base.tar.gz', "$tsoid.tar.gz" ],
 		'enabled' => check_pg_config("#define HAVE_LIBZ 1")
 	},
 	{
 		'compression_method' => 'lz4',
 		'backup_flags' => [ '--compress', 'server-lz4' ],
-		'backup_archive' => 'base.tar.lz4',
-		'decompress_program' => $ENV{'LZ4'},
-		'decompress_flags' => [ '-d', '-m' ],
+		'backup_archive' => ['base.tar.lz4', "$tsoid.tar.lz4" ],
 		'enabled' => check_pg_config("#define USE_LZ4 1")
 	},
 	{
 		'compression_method' => 'zstd',
 		'backup_flags' => [ '--compress', 'server-zstd' ],
-		'backup_archive' => 'base.tar.zst',
-		'decompress_program' => $ENV{'ZSTD'},
-		'decompress_flags' => ['-d'],
+		'backup_archive' => [ 'base.tar.zst', "$tsoid.tar.zst" ],
 		'enabled' => check_pg_config("#define USE_ZSTD 1")
 	},
 	{
 		'compression_method' => 'zstd',
 		'backup_flags' => [ '--compress', 'server-zstd:level=1,long' ],
-		'backup_archive' => 'base.tar.zst',
-		'decompress_program' => $ENV{'ZSTD'},
-		'decompress_flags' => ['-d'],
+		'backup_archive' => [ 'base.tar.zst', "$tsoid.tar.zst" ],
 		'enabled' => check_pg_config("#define USE_ZSTD 1")
 	});
 
@@ -86,47 +92,16 @@ for my $tc (@test_configuration)
 		my $backup_files = join(',',
 			sort grep { $_ ne '.' && $_ ne '..' } slurp_dir($backup_path));
 		my $expected_backup_files =
-		  join(',', sort ('backup_manifest', $tc->{'backup_archive'}));
+		  join(',', sort ('backup_manifest', @{ $tc->{'backup_archive'} }));
 		is($backup_files, $expected_backup_files,
 			"found expected backup files, compression $method");
 
-		# Decompress.
-		if (exists $tc->{'decompress_program'})
-		{
-			my @decompress = ($tc->{'decompress_program'});
-			push @decompress, @{ $tc->{'decompress_flags'} }
-			  if $tc->{'decompress_flags'};
-			push @decompress, $backup_path . '/' . $tc->{'backup_archive'};
-			system_or_bail(@decompress);
-		}
-
-	  SKIP:
-		{
-			my $tar = $ENV{TAR};
-			# don't check for a working tar here, to accommodate various odd
-			# cases. If tar doesn't work the init_from_backup below will fail.
-			skip "no tar program available", 1
-			  if (!defined $tar || $tar eq '');
-
-			# Untar.
-			mkdir($extract_path);
-			system_or_bail($tar, 'xf', $backup_path . '/base.tar',
-				'-C', $extract_path);
-
-			# Verify.
-			$primary->command_ok(
-				[
-					'pg_verifybackup', '-n',
-					'-m', "$backup_path/backup_manifest",
-					'-e', $extract_path
-				],
-				"verify backup, compression $method");
-		}
+		# Verify tar backup.
+		$primary->command_ok(['pg_verifybackup', '-n', '-e', $backup_path],
+			"verify backup, compression $method");
 
 		# Cleanup.
-		unlink($backup_path . '/backup_manifest');
-		unlink($backup_path . '/base.tar');
-		unlink($backup_path . '/' . $tc->{'backup_archive'});
+		rmtree($backup_path);
 		rmtree($extract_path);
 	}
 }
diff --git a/src/bin/pg_verifybackup/t/010_client_untar.pl b/src/bin/pg_verifybackup/t/010_client_untar.pl
index 8c076d46dee..6b7d7483f6e 100644
--- a/src/bin/pg_verifybackup/t/010_client_untar.pl
+++ b/src/bin/pg_verifybackup/t/010_client_untar.pl
@@ -29,41 +29,30 @@ my @test_configuration = (
 		'compression_method' => 'gzip',
 		'backup_flags' => [ '--compress', 'client-gzip:5' ],
 		'backup_archive' => 'base.tar.gz',
-		'decompress_program' => $ENV{'GZIP_PROGRAM'},
-		'decompress_flags' => ['-d'],
 		'enabled' => check_pg_config("#define HAVE_LIBZ 1")
 	},
 	{
 		'compression_method' => 'lz4',
 		'backup_flags' => [ '--compress', 'client-lz4:5' ],
 		'backup_archive' => 'base.tar.lz4',
-		'decompress_program' => $ENV{'LZ4'},
-		'decompress_flags' => ['-d'],
-		'output_file' => 'base.tar',
 		'enabled' => check_pg_config("#define USE_LZ4 1")
 	},
 	{
 		'compression_method' => 'zstd',
 		'backup_flags' => [ '--compress', 'client-zstd:5' ],
 		'backup_archive' => 'base.tar.zst',
-		'decompress_program' => $ENV{'ZSTD'},
-		'decompress_flags' => ['-d'],
 		'enabled' => check_pg_config("#define USE_ZSTD 1")
 	},
 	{
 		'compression_method' => 'zstd',
 		'backup_flags' => [ '--compress', 'client-zstd:level=1,long' ],
 		'backup_archive' => 'base.tar.zst',
-		'decompress_program' => $ENV{'ZSTD'},
-		'decompress_flags' => ['-d'],
 		'enabled' => check_pg_config("#define USE_ZSTD 1")
 	},
 	{
 		'compression_method' => 'parallel zstd',
 		'backup_flags' => [ '--compress', 'client-zstd:workers=3' ],
 		'backup_archive' => 'base.tar.zst',
-		'decompress_program' => $ENV{'ZSTD'},
-		'decompress_flags' => ['-d'],
 		'enabled' => check_pg_config("#define USE_ZSTD 1"),
 		'possibly_unsupported' =>
 		  qr/could not set compression worker count to 3: Unsupported parameter/
@@ -118,40 +107,9 @@ for my $tc (@test_configuration)
 		is($backup_files, $expected_backup_files,
 			"found expected backup files, compression $method");
 
-		# Decompress.
-		if (exists $tc->{'decompress_program'})
-		{
-			my @decompress = ($tc->{'decompress_program'});
-			push @decompress, @{ $tc->{'decompress_flags'} }
-			  if $tc->{'decompress_flags'};
-			push @decompress, $backup_path . '/' . $tc->{'backup_archive'};
-			push @decompress, $backup_path . '/' . $tc->{'output_file'}
-			  if $tc->{'output_file'};
-			system_or_bail(@decompress);
-		}
-
-	  SKIP:
-		{
-			my $tar = $ENV{TAR};
-			# don't check for a working tar here, to accommodate various odd
-			# cases. If tar doesn't work the init_from_backup below will fail.
-			skip "no tar program available", 1
-			  if (!defined $tar || $tar eq '');
-
-			# Untar.
-			mkdir($extract_path);
-			system_or_bail($tar, 'xf', $backup_path . '/base.tar',
-				'-C', $extract_path);
-
-			# Verify.
-			$primary->command_ok(
-				[
-					'pg_verifybackup', '-n',
-					'-m', "$backup_path/backup_manifest",
-					'-e', $extract_path
-				],
-				"verify backup, compression $method");
-		}
+		# Verify tar backup.
+		$primary->command_ok( [ 'pg_verifybackup', '-n', '-e', $backup_path ],
+			"verify backup, compression $method");
 
 		# Cleanup.
 		rmtree($extract_path);
-- 
2.18.0

From 0420df794f89344b4b96df0a767e1ea186bcacce Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Tue, 2 Jul 2024 10:26:35 +0530
Subject: [PATCH v9 10/12] pg_verifybackup: Add backup format and compression
 option

----
NOTE: This patch is not meant to be committed separately. It should
be squashed with the next patch that implements tar format support.
----
---
 src/bin/pg_verifybackup/pg_verifybackup.c | 74 ++++++++++++++++++++++-
 src/bin/pg_verifybackup/pg_verifybackup.h |  1 +
 2 files changed, 73 insertions(+), 2 deletions(-)

diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index 5adf24e8f90..88196cca4e0 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -18,6 +18,7 @@
 #include <sys/stat.h>
 #include <time.h>
 
+#include "common/compression.h"
 #include "common/parse_manifest.h"
 #include "fe_utils/simple_list.h"
 #include "getopt_long.h"
@@ -56,6 +57,7 @@ static void report_manifest_error(JsonManifestParseContext *context,
 								  const char *fmt,...)
 			pg_attribute_printf(2, 3) pg_attribute_noreturn();
 
+static char find_backup_format(verifier_context *context);
 static void verify_backup_directory(verifier_context *context,
 									char *relpath, char *fullpath);
 static void verify_backup_file(verifier_context *context,
@@ -91,6 +93,7 @@ main(int argc, char **argv)
 		{"exit-on-error", no_argument, NULL, 'e'},
 		{"ignore", required_argument, NULL, 'i'},
 		{"manifest-path", required_argument, NULL, 'm'},
+		{"format", required_argument, NULL, 'F'},
 		{"no-parse-wal", no_argument, NULL, 'n'},
 		{"progress", no_argument, NULL, 'P'},
 		{"quiet", no_argument, NULL, 'q'},
@@ -112,6 +115,7 @@ main(int argc, char **argv)
 	progname = get_progname(argv[0]);
 
 	memset(&context, 0, sizeof(context));
+	context.format = '\0';
 
 	if (argc > 1)
 	{
@@ -148,7 +152,7 @@ main(int argc, char **argv)
 	simple_string_list_append(&context.ignore_list, "recovery.signal");
 	simple_string_list_append(&context.ignore_list, "standby.signal");
 
-	while ((c = getopt_long(argc, argv, "ei:m:nPqsw:", long_options, NULL)) != -1)
+	while ((c = getopt_long(argc, argv, "eF:i:m:nPqsw:", long_options, NULL)) != -1)
 	{
 		switch (c)
 		{
@@ -167,6 +171,15 @@ main(int argc, char **argv)
 				manifest_path = pstrdup(optarg);
 				canonicalize_path(manifest_path);
 				break;
+			case 'F':
+				if (strcmp(optarg, "p") == 0 || strcmp(optarg, "plain") == 0)
+					context.format = 'p';
+				else if (strcmp(optarg, "t") == 0 || strcmp(optarg, "tar") == 0)
+					context.format = 't';
+				else
+					pg_fatal("invalid backup format \"%s\", must be \"plain\" or \"tar\"",
+							 optarg);
+				break;
 			case 'n':
 				no_parse_wal = true;
 				break;
@@ -214,11 +227,26 @@ main(int argc, char **argv)
 		pg_fatal("cannot specify both %s and %s",
 				 "-P/--progress", "-q/--quiet");
 
+	/* Determine the backup format if it hasn't been specified. */
+	if (context.format == '\0')
+		context.format = find_backup_format(&context);
+
 	/* Unless --no-parse-wal was specified, we will need pg_waldump. */
 	if (!no_parse_wal)
 	{
 		int			ret;
 
+		/*
+		 * XXX: In the future, we should consider enhancing pg_waldump to read
+		 * WAL files from the tar archive.
+		 */
+		if (context.format != 'p')
+		{
+			pg_log_error("pg_waldump does not support parsing WAL files from a tar archive.");
+			pg_log_error_hint("Try \"%s --help\" to skip parse WAL files option.", progname);
+			exit(1);
+		}
+
 		pg_waldump_path = pg_malloc(MAXPGPATH);
 		ret = find_other_exec(argv[0], "pg_waldump",
 							  "pg_waldump (PostgreSQL) " PG_VERSION "\n",
@@ -273,8 +301,13 @@ main(int argc, char **argv)
 	/*
 	 * Now do the expensive work of verifying file checksums, unless we were
 	 * told to skip it.
+	 *
+	 * We were only checking the plain backup here. For the tar backup, file
+	 * checksums verification (if requested) will be done immediately when the
+	 * file is accessed, as we don't have random access to the files like we
+	 * do with plain backups.
 	 */
-	if (!context.skip_checksums)
+	if (!context.skip_checksums && context.format == 'p')
 		verify_backup_checksums(&context);
 
 	/*
@@ -975,6 +1008,42 @@ should_ignore_relpath(verifier_context *context, const char *relpath)
 	return false;
 }
 
+/*
+ * To detect the backup format, it checks for the PG_VERSION file in the backup
+ * directory. If found, it will be considered a plain-format backup; otherwise,
+ * it will be assumed to be a tar-format backup.
+ */
+static char
+find_backup_format(verifier_context *context)
+{
+	char		result;
+	char	   *path;
+	struct stat sb;
+
+	/* Should be here only if the backup format is unknown */
+	Assert(context->format == '\0');
+
+	/* Check PG_VERSION file. */
+	path = psprintf("%s/%s", context->backup_directory, "PG_VERSION");
+	if (stat(path, &sb) == 0)
+		result = 'p';
+	else
+	{
+		if (errno != ENOENT)
+		{
+			pg_log_error("cannot determine backup format");
+			pg_log_error_hint("Try \"%s --help\" for more information.", progname);
+			exit(1);
+		}
+
+		/* Otherwise, it is assumed to be a TAR backup. */
+		result = 't';
+	}
+	pfree(path);
+
+	return result;
+}
+
 /*
  * Print a progress report based on the global variables.
  *
@@ -1032,6 +1101,7 @@ usage(void)
 	printf(_("  -e, --exit-on-error         exit immediately on error\n"));
 	printf(_("  -i, --ignore=RELATIVE_PATH  ignore indicated path\n"));
 	printf(_("  -m, --manifest-path=PATH    use specified path for manifest\n"));
+	printf(_("  -F, --format=p|t            backup format (plain, tar)\n"));
 	printf(_("  -n, --no-parse-wal          do not try to parse WAL files\n"));
 	printf(_("  -P, --progress              show progress information\n"));
 	printf(_("  -q, --quiet                 do not print any output, except for errors\n"));
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.h b/src/bin/pg_verifybackup/pg_verifybackup.h
index 42d01c26466..856e8947c1d 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.h
+++ b/src/bin/pg_verifybackup/pg_verifybackup.h
@@ -105,6 +105,7 @@ typedef struct verifier_context
 	manifest_data *manifest;
 	char	   *backup_directory;
 	SimpleStringList ignore_list;
+	char		format;			/* backup format:  p(lain)/t(ar) */
 	bool		skip_checksums;
 	bool		exit_on_error;
 	bool		saw_any_error;
-- 
2.18.0

From 18887cb590f950cdbc7f4e5bc11812f4e85ccfdd Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Thu, 8 Aug 2024 16:01:33 +0530
Subject: [PATCH v9 09/12] Add simple_ptr_list_destroy() and
 simple_ptr_list_destroy_deep() API.

We didn't have any helper function to destroy SimplePtrList, likely
because it wasn't needed before, but it's required in a later patch in
this set.  I've added two functions for this purpose, inspired by
list_free() and list_free_deep().
---
 src/fe_utils/simple_list.c         | 39 ++++++++++++++++++++++++++++++
 src/include/fe_utils/simple_list.h |  2 ++
 2 files changed, 41 insertions(+)

diff --git a/src/fe_utils/simple_list.c b/src/fe_utils/simple_list.c
index 2d88eb54067..9d218911c31 100644
--- a/src/fe_utils/simple_list.c
+++ b/src/fe_utils/simple_list.c
@@ -173,3 +173,42 @@ simple_ptr_list_append(SimplePtrList *list, void *ptr)
 		list->head = cell;
 	list->tail = cell;
 }
+
+/*
+ * Destroy a pointer list and optionally the pointed-to element
+ */
+static void
+simple_ptr_list_destroy_private(SimplePtrList *list, bool deep)
+{
+	SimplePtrListCell *cell;
+
+	cell = list->head;
+	while (cell != NULL)
+	{
+		SimplePtrListCell *next;
+
+		next = cell->next;
+		if (deep)
+			pg_free(cell->ptr);
+		pg_free(cell);
+		cell = next;
+	}
+}
+
+/*
+ * Destroy a pointer list and the pointed-to element
+ */
+void
+simple_ptr_list_destroy_deep(SimplePtrList *list)
+{
+	simple_ptr_list_destroy_private(list, true);
+}
+
+/*
+ * Destroy only pointer list and not the pointed-to element
+ */
+void
+simple_ptr_list_destroy(SimplePtrList *list)
+{
+	simple_ptr_list_destroy_private(list, false);
+}
diff --git a/src/include/fe_utils/simple_list.h b/src/include/fe_utils/simple_list.h
index d42ecded8ed..5b7cbec8a62 100644
--- a/src/include/fe_utils/simple_list.h
+++ b/src/include/fe_utils/simple_list.h
@@ -66,5 +66,7 @@ extern void simple_string_list_destroy(SimpleStringList *list);
 extern const char *simple_string_list_not_touched(SimpleStringList *list);
 
 extern void simple_ptr_list_append(SimplePtrList *list, void *ptr);
+extern void simple_ptr_list_destroy_deep(SimplePtrList *list);
+extern void simple_ptr_list_destroy(SimplePtrList *list);
 
 #endif							/* SIMPLE_LIST_H */
-- 
2.18.0

From 6ef19e787db0174040cd0d020085d2a78c2c78f7 Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Thu, 1 Aug 2024 12:10:34 +0530
Subject: [PATCH v9 05/12] Refactor: move some part of pg_verifybackup.c to
 pg_verifybackup.h

---
 src/bin/pg_verifybackup/pg_verifybackup.c |  94 +------------------
 src/bin/pg_verifybackup/pg_verifybackup.h | 108 ++++++++++++++++++++++
 2 files changed, 112 insertions(+), 90 deletions(-)
 create mode 100644 src/bin/pg_verifybackup/pg_verifybackup.h

diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index c6d01d52335..384a4dd3500 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -18,12 +18,11 @@
 #include <sys/stat.h>
 #include <time.h>
 
-#include "common/controldata_utils.h"
-#include "common/hashfn_unstable.h"
 #include "common/logging.h"
 #include "common/parse_manifest.h"
 #include "fe_utils/simple_list.h"
 #include "getopt_long.h"
+#include "pg_verifybackup.h"
 #include "pgtime.h"
 
 /*
@@ -40,84 +39,6 @@
  */
 #define ESTIMATED_BYTES_PER_MANIFEST_LINE	100
 
-/*
- * How many bytes should we try to read from a file at once?
- */
-#define READ_CHUNK_SIZE				(128 * 1024)
-
-/*
- * Each file described by the manifest file is parsed to produce an object
- * like this.
- */
-typedef struct manifest_file
-{
-	uint32		status;			/* hash status */
-	const char *pathname;
-	size_t		size;
-	pg_checksum_type checksum_type;
-	int			checksum_length;
-	uint8	   *checksum_payload;
-	bool		matched;
-	bool		bad;
-} manifest_file;
-
-#define should_verify_checksum(m) \
-	(((m)->matched) && !((m)->bad) && (((m)->checksum_type) != CHECKSUM_TYPE_NONE))
-
-/*
- * Define a hash table which we can use to store information about the files
- * mentioned in the backup manifest.
- */
-#define SH_PREFIX		manifest_files
-#define SH_ELEMENT_TYPE	manifest_file
-#define SH_KEY_TYPE		const char *
-#define	SH_KEY			pathname
-#define SH_HASH_KEY(tb, key)	hash_string(key)
-#define SH_EQUAL(tb, a, b)		(strcmp(a, b) == 0)
-#define	SH_SCOPE		static inline
-#define SH_RAW_ALLOCATOR	pg_malloc0
-#define SH_DECLARE
-#define SH_DEFINE
-#include "lib/simplehash.h"
-
-/*
- * Each WAL range described by the manifest file is parsed to produce an
- * object like this.
- */
-typedef struct manifest_wal_range
-{
-	TimeLineID	tli;
-	XLogRecPtr	start_lsn;
-	XLogRecPtr	end_lsn;
-	struct manifest_wal_range *next;
-	struct manifest_wal_range *prev;
-} manifest_wal_range;
-
-/*
- * All the data parsed from a backup_manifest file.
- */
-typedef struct manifest_data
-{
-	int			version;
-	uint64		system_identifier;
-	manifest_files_hash *files;
-	manifest_wal_range *first_wal_range;
-	manifest_wal_range *last_wal_range;
-} manifest_data;
-
-/*
- * All of the context information we need while checking a backup manifest.
- */
-typedef struct verifier_context
-{
-	manifest_data *manifest;
-	char	   *backup_directory;
-	SimpleStringList ignore_list;
-	bool		skip_checksums;
-	bool		exit_on_error;
-	bool		saw_any_error;
-} verifier_context;
-
 static manifest_data *parse_manifest_file(char *manifest_path);
 static void verifybackup_version_cb(JsonManifestParseContext *context,
 									int manifest_version);
@@ -151,13 +72,6 @@ static void parse_required_wal(verifier_context *context,
 							   char *pg_waldump_path,
 							   char *wal_directory);
 
-static void report_backup_error(verifier_context *context,
-								const char *pg_restrict fmt,...)
-			pg_attribute_printf(2, 3);
-static void report_fatal_error(const char *pg_restrict fmt,...)
-			pg_attribute_printf(1, 2) pg_attribute_noreturn();
-static bool should_ignore_relpath(verifier_context *context, const char *relpath);
-
 static void progress_report(bool finished);
 static void usage(void);
 
@@ -980,7 +894,7 @@ parse_required_wal(verifier_context *context, char *pg_waldump_path,
  * Update the context to indicate that we saw an error, and exit if the
  * context says we should.
  */
-static void
+void
 report_backup_error(verifier_context *context, const char *pg_restrict fmt,...)
 {
 	va_list		ap;
@@ -997,7 +911,7 @@ report_backup_error(verifier_context *context, const char *pg_restrict fmt,...)
 /*
  * Report a fatal error and exit
  */
-static void
+void
 report_fatal_error(const char *pg_restrict fmt,...)
 {
 	va_list		ap;
@@ -1016,7 +930,7 @@ report_fatal_error(const char *pg_restrict fmt,...)
  * Note that by "prefix" we mean a parent directory; for this purpose,
  * "aa/bb" is not a prefix of "aa/bbb", but it is a prefix of "aa/bb/cc".
  */
-static bool
+bool
 should_ignore_relpath(verifier_context *context, const char *relpath)
 {
 	SimpleStringListCell *cell;
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.h b/src/bin/pg_verifybackup/pg_verifybackup.h
new file mode 100644
index 00000000000..bd9c95c477a
--- /dev/null
+++ b/src/bin/pg_verifybackup/pg_verifybackup.h
@@ -0,0 +1,108 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_verifybackup.h
+ *	  Verify a backup against a backup manifest.
+ *
+ * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/bin/pg_verifybackup/pg_verifybackup.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef PG_VERIFYBACKUP_H
+#define PG_VERIFYBACKUP_H
+
+#include "common/controldata_utils.h"
+#include "common/hashfn_unstable.h"
+#include "common/parse_manifest.h"
+#include "fe_utils/simple_list.h"
+
+/*
+ * How many bytes should we try to read from a file at once?
+ */
+#define READ_CHUNK_SIZE				(128 * 1024)
+
+/*
+ * Each file described by the manifest file is parsed to produce an object
+ * like this.
+ */
+typedef struct manifest_file
+{
+	uint32		status;			/* hash status */
+	const char *pathname;
+	size_t		size;
+	pg_checksum_type checksum_type;
+	int			checksum_length;
+	uint8	   *checksum_payload;
+	bool		matched;
+	bool		bad;
+} manifest_file;
+
+#define should_verify_checksum(m) \
+	(((m)->matched) && !((m)->bad) && (((m)->checksum_type) != CHECKSUM_TYPE_NONE))
+
+/*
+ * Define a hash table which we can use to store information about the files
+ * mentioned in the backup manifest.
+ */
+#define SH_PREFIX		manifest_files
+#define SH_ELEMENT_TYPE	manifest_file
+#define SH_KEY_TYPE		const char *
+#define	SH_KEY			pathname
+#define SH_HASH_KEY(tb, key)	hash_string(key)
+#define SH_EQUAL(tb, a, b)		(strcmp(a, b) == 0)
+#define	SH_SCOPE		static inline
+#define SH_RAW_ALLOCATOR	pg_malloc0
+#define SH_DECLARE
+#define SH_DEFINE
+#include "lib/simplehash.h"
+
+/*
+ * Each WAL range described by the manifest file is parsed to produce an
+ * object like this.
+ */
+typedef struct manifest_wal_range
+{
+	TimeLineID	tli;
+	XLogRecPtr	start_lsn;
+	XLogRecPtr	end_lsn;
+	struct manifest_wal_range *next;
+	struct manifest_wal_range *prev;
+} manifest_wal_range;
+
+/*
+ * All the data parsed from a backup_manifest file.
+ */
+typedef struct manifest_data
+{
+	int			version;
+	uint64		system_identifier;
+	manifest_files_hash *files;
+	manifest_wal_range *first_wal_range;
+	manifest_wal_range *last_wal_range;
+} manifest_data;
+
+/*
+ * All of the context information we need while checking a backup manifest.
+ */
+typedef struct verifier_context
+{
+	manifest_data *manifest;
+	char	   *backup_directory;
+	SimpleStringList ignore_list;
+	bool		skip_checksums;
+	bool		exit_on_error;
+	bool		saw_any_error;
+} verifier_context;
+
+extern void report_backup_error(verifier_context *context,
+								const char *pg_restrict fmt,...)
+			pg_attribute_printf(2, 3);
+extern void report_fatal_error(const char *pg_restrict fmt,...)
+			pg_attribute_printf(1, 2) pg_attribute_noreturn();
+extern bool should_ignore_relpath(verifier_context *context,
+								  const char *relpath);
+
+#endif							/* PG_VERIFYBACKUP_H */
-- 
2.18.0

From 424537524164197abacfb9ce81001ee5da04d9c5 Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Thu, 1 Aug 2024 16:45:55 +0530
Subject: [PATCH v9 07/12] Refactor: split verify_file_checksum() function.

Move the core functionality of verify_file_checksum to a new function
to reuse it instead of duplicating the code.

The verify_file_checksum() function is designed to take a file path,
open and read the file contents, and then calculate the checksum.
However, for TAR backups, instead of a file path, we receive the file
content in chunks, and the checksum needs to be calculated
incrementally. While the initial operations for plain and TAR backup
checksum calculations differ, the final checks and error handling are
the same. By moving the shared logic to a separate function, we can
reuse the code for both types of backups.
---
 src/bin/pg_verifybackup/pg_verifybackup.c | 18 ++++++++++++++++--
 src/bin/pg_verifybackup/pg_verifybackup.h |  3 +++
 2 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index e4288f453ef..e4f499fcd37 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -787,7 +787,6 @@ verify_file_checksum(verifier_context *context, manifest_file *m,
 	int			rc;
 	size_t		bytes_read = 0;
 	uint8		checksumbuf[PG_CHECKSUM_MAX_LENGTH];
-	int			checksumlen;
 
 	/* Open the target file. */
 	if ((fd = open(fullpath, O_RDONLY | PG_BINARY, 0)) < 0)
@@ -853,8 +852,23 @@ verify_file_checksum(verifier_context *context, manifest_file *m,
 		return;
 	}
 
+	/* Do the final computation and verification. */
+	verify_checksum(context, m, &checksum_ctx, checksumbuf);
+}
+
+/*
+ * A helper function to finalize checksum computation and verify it against the
+ * backup manifest information.
+ */
+void
+verify_checksum(verifier_context *context, manifest_file *m,
+				pg_checksum_context *checksum_ctx, uint8 *checksumbuf)
+{
+	int			checksumlen;
+	const char *relpath = m->pathname;
+
 	/* Get the final checksum. */
-	checksumlen = pg_checksum_final(&checksum_ctx, checksumbuf);
+	checksumlen = pg_checksum_final(checksum_ctx, checksumbuf);
 	if (checksumlen < 0)
 	{
 		report_backup_error(context,
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.h b/src/bin/pg_verifybackup/pg_verifybackup.h
index 2e71f14669b..12812cf5584 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.h
+++ b/src/bin/pg_verifybackup/pg_verifybackup.h
@@ -100,6 +100,9 @@ typedef struct verifier_context
 
 extern manifest_file *verify_manifest_entry(verifier_context *context,
 											char *relpath, int64 filesize);
+extern void verify_checksum(verifier_context *context, manifest_file *m,
+							pg_checksum_context *checksum_ctx,
+							uint8 *checksumbuf);
 
 extern void report_backup_error(verifier_context *context,
 								const char *pg_restrict fmt,...)
-- 
2.18.0

From 47c999edada3ce31009351ed2393b04d7ea9f67e Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Thu, 8 Aug 2024 15:10:43 +0530
Subject: [PATCH v9 04/12] Refactor: move skip_checksums global variable to
 verifier_context struct

To enable access to this flag in another file.
---
 src/bin/pg_verifybackup/pg_verifybackup.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index d77e70fbe38..c6d01d52335 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -113,6 +113,7 @@ typedef struct verifier_context
 	manifest_data *manifest;
 	char	   *backup_directory;
 	SimpleStringList ignore_list;
+	bool		skip_checksums;
 	bool		exit_on_error;
 	bool		saw_any_error;
 } verifier_context;
@@ -164,7 +165,6 @@ static const char *progname;
 
 /* options */
 static bool show_progress = false;
-static bool skip_checksums = false;
 
 /* Progress indicators */
 static uint64 total_size = 0;
@@ -266,7 +266,7 @@ main(int argc, char **argv)
 				quiet = true;
 				break;
 			case 's':
-				skip_checksums = true;
+				context.skip_checksums = true;
 				break;
 			case 'w':
 				wal_directory = pstrdup(optarg);
@@ -363,7 +363,7 @@ main(int argc, char **argv)
 	 * Now do the expensive work of verifying file checksums, unless we were
 	 * told to skip it.
 	 */
-	if (!skip_checksums)
+	if (!context.skip_checksums)
 		verify_backup_checksums(&context);
 
 	/*
@@ -739,7 +739,8 @@ verify_backup_file(verifier_context *context, char *relpath, char *fullpath)
 		verify_control_file(fullpath, context->manifest->system_identifier);
 
 	/* Update statistics for progress report, if necessary */
-	if (show_progress && !skip_checksums && should_verify_checksum(m))
+	if (show_progress && !context->skip_checksums &&
+		should_verify_checksum(m))
 		total_size += m->size;
 
 	/*
-- 
2.18.0

From 5a323df24f2d0b76fb06ac58c1c70de46056e4ee Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Thu, 8 Aug 2024 14:45:04 +0530
Subject: [PATCH v9 06/12] Refactor: split verify_backup_file() function.

Move the manifest entry verification code into a new function called
verify_manifest_entry() so that it can be reused for tar backup
verification. If verify_manifest_entry() doesn't find an entry, it
reports an error as before and returns NULL to the caller. This is why
a NULL check is added to should_verify_checksum().
---
 src/bin/pg_verifybackup/pg_verifybackup.c | 49 +++++++++++++++--------
 src/bin/pg_verifybackup/pg_verifybackup.h |  6 ++-
 2 files changed, 37 insertions(+), 18 deletions(-)

diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index 384a4dd3500..e4288f453ef 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -622,6 +622,32 @@ verify_backup_file(verifier_context *context, char *relpath, char *fullpath)
 		return;
 	}
 
+	/* Check the backup manifest entry for this file. */
+	m = verify_manifest_entry(context, relpath, sb.st_size);
+
+	/*
+	 * Validate the manifest system identifier, not available in manifest
+	 * version 1.
+	 */
+	if (context->manifest->version != 1 &&
+		strcmp(relpath, "global/pg_control") == 0 &&
+		m != NULL && m->matched && !m->bad)
+		verify_control_file(fullpath, context->manifest->system_identifier);
+
+	/* Update statistics for progress report, if necessary */
+	if (show_progress && !context->skip_checksums &&
+		should_verify_checksum(m))
+		total_size += m->size;
+}
+
+/*
+ * Verify file and its size entry in the manifest.
+ */
+manifest_file *
+verify_manifest_entry(verifier_context *context, char *relpath, int64 filesize)
+{
+	manifest_file *m;
+
 	/* Check whether there's an entry in the manifest hash. */
 	m = manifest_files_lookup(context->manifest->files, relpath);
 	if (m == NULL)
@@ -629,40 +655,29 @@ verify_backup_file(verifier_context *context, char *relpath, char *fullpath)
 		report_backup_error(context,
 							"\"%s\" is present on disk but not in the manifest",
 							relpath);
-		return;
+		return NULL;
 	}
 
 	/* Flag this entry as having been encountered in the filesystem. */
 	m->matched = true;
 
 	/* Check that the size matches. */
-	if (m->size != sb.st_size)
+	if (m->size != filesize)
 	{
 		report_backup_error(context,
 							"\"%s\" has size %lld on disk but size %zu in the manifest",
-							relpath, (long long int) sb.st_size, m->size);
+							relpath, (long long int) filesize, m->size);
 		m->bad = true;
 	}
 
-	/*
-	 * Validate the manifest system identifier, not available in manifest
-	 * version 1.
-	 */
-	if (context->manifest->version != 1 &&
-		strcmp(relpath, "global/pg_control") == 0)
-		verify_control_file(fullpath, context->manifest->system_identifier);
-
-	/* Update statistics for progress report, if necessary */
-	if (show_progress && !context->skip_checksums &&
-		should_verify_checksum(m))
-		total_size += m->size;
-
 	/*
 	 * We don't verify checksums at this stage. We first finish verifying that
 	 * we have the expected set of files with the expected sizes, and only
 	 * afterwards verify the checksums. That's because computing checksums may
 	 * take a while, and we'd like to report more obvious problems quickly.
 	 */
+
+	return m;
 }
 
 /*
@@ -825,7 +840,7 @@ verify_file_checksum(verifier_context *context, manifest_file *m,
 
 	/*
 	 * Double-check that we read the expected number of bytes from the file.
-	 * Normally, a file size mismatch would be caught in verify_backup_file
+	 * Normally, a file size mismatch would be caught in verify_manifest_entry
 	 * and this check would never be reached, but this provides additional
 	 * safety and clarity in the event of concurrent modifications or
 	 * filesystem misbehavior.
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.h b/src/bin/pg_verifybackup/pg_verifybackup.h
index bd9c95c477a..2e71f14669b 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.h
+++ b/src/bin/pg_verifybackup/pg_verifybackup.h
@@ -41,7 +41,8 @@ typedef struct manifest_file
 } manifest_file;
 
 #define should_verify_checksum(m) \
-	(((m)->matched) && !((m)->bad) && (((m)->checksum_type) != CHECKSUM_TYPE_NONE))
+	(((m) != NULL) && ((m)->matched) && !((m)->bad) && \
+	 (((m)->checksum_type) != CHECKSUM_TYPE_NONE))
 
 /*
  * Define a hash table which we can use to store information about the files
@@ -97,6 +98,9 @@ typedef struct verifier_context
 	bool		saw_any_error;
 } verifier_context;
 
+extern manifest_file *verify_manifest_entry(verifier_context *context,
+											char *relpath, int64 filesize);
+
 extern void report_backup_error(verifier_context *context,
 								const char *pg_restrict fmt,...)
 			pg_attribute_printf(2, 3);
-- 
2.18.0

Reply via email to