On Tue, Aug 13, 2024 at 10:49 PM Robert Haas <robertmh...@gmail.com> wrote:
>
> On Mon, Aug 12, 2024 at 5:13 AM Amul Sul <sula...@gmail.com> wrote:
> > I tried this in the attached version and made a few additional changes
> > based on Sravan's off-list comments regarding function names and
> > descriptions.
> >
> > Now, verification happens in two passes. The first pass simply
> > verifies the file names, determines their compression types, and
> > returns a list of valid tar files whose contents need to be verified
> > in the second pass. The second pass is called at the end of
> > verify_backup_directory() after all files in that directory have been
> > scanned. I named the functions for pass 1 and pass 2 as
> > verify_tar_file_name() and verify_tar_file_contents(), respectively.
> > The rest of the code flow is similar as in the previous version.
> >
> > In the attached patch set, I abandoned the changes, touching the
> > progress reporting code of plain backups by dropping the previous 0009
> > patch. The new 0009 patch adds missing APIs to simple_list.c to
> > destroy SimplePtrList. The rest of the patch numbers remain unchanged.
>
> I think you've entangled the code paths here for plain-format backup
> and tar-format backup in a way that is not very nice. I suggest
> refactoring things so that verify_backup_directory() is only called
> for plain-format backups, and you have some completely separate
> function (perhaps verify_tar_backup) that is called for tar-format
> backups. I don't think verify_backup_file() should be shared between
> tar-format and plain-format backups either. Let that be just for
> plain-format backups, and have separate logic for tar-format backups.
> Right now you've got "if" statements in various places trying to get
> all the cases correct, but I think you've missed some (and there's
> also the issue of updating all the comments).
>
> For instance, verify_backup_file() recurses into subdirectories, but
> that behavior is inappropriate for a tar format backup, where
> subdirectories should instead be treated like stray files: complain
> that they exist. pg_verify_backup() does this:
>
>     /* If it's a directory, just recurse. */
>     if (S_ISDIR(sb.st_mode))
>     {
>         verify_backup_directory(context, relpath, fullpath);
>         return;
>     }
>
>     /* If it's not a directory, it should be a plain file. */
>     if (!S_ISREG(sb.st_mode))
>     {
>         report_backup_error(context,
>                             "\"%s\" is not a file or directory",
>                             relpath);
>         return;
>     }
>
> For a plain format backup, this whole thing should be just:
>
>     /* In a tar format backup, we expect only plain files. */
>     if (!S_ISREG(sb.st_mode))
>     {
>         report_backup_error(context,
>                             "\"%s\" is not a plain file",
>                             relpath);
>         return;
>     }
>
> Also, immediately above, you do
> simple_string_list_append(&context->ignore_list, relpath), but that is
> pointless in the tar-file case, and arguably wrong, if -i is going to
> ignore both pathnames in the base directory and also pathnames inside
> the tar files, because we could add something to the ignore list here
> -- accomplishing nothing useful -- and then that ignore-list entry
> could cause us to disregard a stray file with the same name present
> inside one of the tar files -- which is silly. Note that the actual
> point of this logic is to make sure that if we can't stat() a certain
> directory, we don't go off and issue a million complaints about all
> the files in that directory being missing. But this doesn't accomplish
> that goal for a tar-format backup. For a tar-format backup, you'd want
> to figure out which files in the manifest we don't expect to see based
> on this file being inaccessible, and then arrange to suppress future
> complaints about all of those files. But you can't implement that
> here, because you haven't parsed the file name yet. That happens
> later, in verify_tar_file_name().
>
> You could add a whole bunch more if statements here and try to work
> around these issues, but it seems pretty obviously a dead end to me.
> Almost the entire function is going to end up being inside of an
> if-statement. Essentially the only thing in verify_backup_file() that
> should actually be the same in the plain and tar-format cases is that
> you should call stat() either way and check whether it throws an
> error. But that is not enough justification for trying to share the
> whole function.
>

I agree with keeping verify_backup_file() separate, but I'm hesitant
about doing the same for verify_backup_directory(). Otherwise, we
might end up with nearly duplicate functions that are very similar.
Since the changes in verify_backup_directory() are minimal, I've left
it as is, let me know your thoughts on that. I’ve kept
verify_backup_file() separated for plain backup files and added a new
function, verify_tar_backup_file(), in patch 0011. To maintain
consistency, I also renamed verify_backup_file() to
verify_plain_backup_file() in patch 0006.

> I find the logic in verify_tar_file_name() to be somewhat tortured as
> well. The strstr() calls will match those strings anywhere in the
> filename, not just at the end. But also, even if you fixed that, why
> work backward from the end of the filename toward the beginning? It
> would seem like it would more sense to (1) first check if the string
> starts with "base" and set suffix equal to pathname+4, (2) if not,
> strtol(pathname, &suffix, 10) and complain if we didn't eat at least
> one character or got 0 or something too big to be an OID, (3) check
> whether suffix is .tar, .tar.gz, etc.
>

Ok, did it this way.

> In verify_member_checksum(), you set mystreamer->verify_checksum =
> false. That would be correct if there could only ever be one call to
> verify_member_checksum() per member file, but there is no such rule.
> There can be (and, I think, normally will be) more than one
> ASTREAMER_MEMBER_CONTENTS chunk. I'm a little confused as to how this
> code passes any kind of testing.
>

I did that to avoid adding the line for every error case where we
return. The flag is re-enabled when the file contents are yet to be
received in mystreamer->received_bytes < m->size.

> Also in verify_member_checksum(), the mystreamer->received_bytes <
> m->size seems strange. I don't think this is the right way to do
> something when you reach the end of an archive member. The right way
> to do that is to do it when the ASTREAMER_MEMBER_TRAILER chunk shows
> up.
>

Ok, I've split this into two parts: the first part handles incremental
computation at ASTREAMER_MEMBER_CONTENTS, and the second part performs
the final verification at the ASTREAMER_MEMBER_TRAILER stage.

> In verify_member_control_data(), you use astreamer_buffer_untIl(). But
> that's the same buffer that is being used by verify_member_checksum(),
> so I don't really understand how you expect this to work. If this code
> path were ever taken, verify_member_checksum() would see the same data
> more than once.
>

No, for checksum calculation, we directly compute the checksum on the
received content (which is the caller's buffer) without copying it.
However, for control file verification, we need the entire file, so we
first copy it into a local buffer within myastremer. This local buffer
is used solely for storing control file data.

I've made some adjustments to align this code style with checksum
verification: copying will occur during the ASTREAMER_MEMBER_CONTENTS
stage, and final verification will be performed in the
ASTREAMER_MEMBER_TRAILER stage.

> The call to pg_log_debug() in this function also seems quite random.
> In a plain-format backup, we'd actually be doing something different
> for pg_controldata vs. other files, namely reading it during the
> initial directory scan. But here we're reading the file in exactly the
> same sense as we're reading every other file, neither more nor less,
> so why mention this file and not all of the others? And why at this
> exact spot in the code?
>

Agreed, it was added without much thinking.

> I suspect that the report_fatal_error("%s: could not read control
> file: read %d of %zu", ...) call is unreachable. I agree that you need
> to handle the case where the control file inside the tar file is not
> the expected length, and in fact I think you should probably write a
> TAP test for that exact scenario to make sure it works. I bet this
> doesn't. Even if it did, the error message makes no sense in context.
> In the plain-format backup, this error would come from code reading
> the actual bytes off the disk -- i.e. the complaint about not being
> able to read the control file would come from the read() system call.
> Here it doesn't.
>

Agreed. Replace that with a check for an expected file size.

In addition to the mentioned changes, I have renamed functions in
astreamer_verify.c to ensure a consistent naming format.

Regards,
Amul.
From acabc34488287ab3346551db23b4c99fac0850a9 Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Wed, 7 Aug 2024 18:15:29 +0530
Subject: [PATCH v10 12/12] pg_verifybackup: Tests and document

----
NOTE: This patch is not meant to be committed separately. It should
be squashed with the previous patch that implements tar format support.
----
---
 doc/src/sgml/ref/pg_verifybackup.sgml         | 42 ++++++++++-
 src/bin/pg_verifybackup/t/001_basic.pl        |  6 +-
 src/bin/pg_verifybackup/t/004_options.pl      |  2 +-
 src/bin/pg_verifybackup/t/005_bad_manifest.pl |  2 +-
 src/bin/pg_verifybackup/t/008_untar.pl        | 73 ++++++-------------
 src/bin/pg_verifybackup/t/010_client_untar.pl | 48 +-----------
 6 files changed, 72 insertions(+), 101 deletions(-)

diff --git a/doc/src/sgml/ref/pg_verifybackup.sgml b/doc/src/sgml/ref/pg_verifybackup.sgml
index a3f167f9f6e..60f771c7663 100644
--- a/doc/src/sgml/ref/pg_verifybackup.sgml
+++ b/doc/src/sgml/ref/pg_verifybackup.sgml
@@ -34,8 +34,10 @@ PostgreSQL documentation
    integrity of a database cluster backup taken using
    <command>pg_basebackup</command> against a
    <literal>backup_manifest</literal> generated by the server at the time
-   of the backup.  The backup must be stored in the "plain"
-   format; a "tar" format backup can be checked after extracting it.
+   of the backup.  The backup must be stored in the "plain" or "tar"
+   format.  Verification is supported for <literal>gzip</literal>,
+   <literal>lz4</literal>, and  <literal>zstd</literal> compressed tar backup;
+   any other compressed format backups can be checked after decompressing them.
   </para>
 
   <para>
@@ -168,6 +170,42 @@ PostgreSQL documentation
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>-F <replaceable class="parameter">format</replaceable></option></term>
+      <term><option>--format=<replaceable class="parameter">format</replaceable></option></term>
+      <listitem>
+       <para>
+        Specifies the format of the backup. <replaceable>format</replaceable>
+        can be one of the following:
+
+        <variablelist>
+         <varlistentry>
+          <term><literal>p</literal></term>
+          <term><literal>plain</literal></term>
+          <listitem>
+           <para>
+            Backup consists of plain files with the same layout as the
+            source server's data directory and tablespaces.
+           </para>
+          </listitem>
+         </varlistentry>
+
+         <varlistentry>
+          <term><literal>t</literal></term>
+          <term><literal>tar</literal></term>
+          <listitem>
+           <para>
+            Backup consists of tar files. The main data directory's contents
+            will be written to a file named <filename>base.tar</filename>,
+            and each other tablespace will be written to a separate tar file
+            named after that tablespace's OID.
+           </para>
+           </listitem>
+         </varlistentry>
+        </variablelist></para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>-n</option></term>
       <term><option>--no-parse-wal</option></term>
diff --git a/src/bin/pg_verifybackup/t/001_basic.pl b/src/bin/pg_verifybackup/t/001_basic.pl
index 2f3e52d296f..ca5b0402b7d 100644
--- a/src/bin/pg_verifybackup/t/001_basic.pl
+++ b/src/bin/pg_verifybackup/t/001_basic.pl
@@ -17,11 +17,11 @@ command_fails_like(
 	qr/no backup directory specified/,
 	'target directory must be specified');
 command_fails_like(
-	[ 'pg_verifybackup', $tempdir ],
+	[ 'pg_verifybackup', '-Fp', $tempdir ],
 	qr/could not open file.*\/backup_manifest\"/,
 	'pg_verifybackup requires a manifest');
 command_fails_like(
-	[ 'pg_verifybackup', $tempdir, $tempdir ],
+	[ 'pg_verifybackup', '-Fp', $tempdir, $tempdir ],
 	qr/too many command-line arguments/,
 	'multiple target directories not allowed');
 
@@ -31,7 +31,7 @@ close($fh);
 
 # but then try to use an alternate, nonexisting manifest
 command_fails_like(
-	[ 'pg_verifybackup', '-m', "$tempdir/not_the_manifest", $tempdir ],
+	[ 'pg_verifybackup', '-Fp', '-m', "$tempdir/not_the_manifest", $tempdir ],
 	qr/could not open file.*\/not_the_manifest\"/,
 	'pg_verifybackup respects -m flag');
 
diff --git a/src/bin/pg_verifybackup/t/004_options.pl b/src/bin/pg_verifybackup/t/004_options.pl
index 8ed2214408e..2f197648740 100644
--- a/src/bin/pg_verifybackup/t/004_options.pl
+++ b/src/bin/pg_verifybackup/t/004_options.pl
@@ -108,7 +108,7 @@ unlike(
 # Test valid manifest with nonexistent backup directory.
 command_fails_like(
 	[
-		'pg_verifybackup', '-m',
+		'pg_verifybackup', '-Fp', '-m',
 		"$backup_path/backup_manifest", "$backup_path/fake"
 	],
 	qr/could not open directory/,
diff --git a/src/bin/pg_verifybackup/t/005_bad_manifest.pl b/src/bin/pg_verifybackup/t/005_bad_manifest.pl
index c4ed64b62d5..28c51b6feb0 100644
--- a/src/bin/pg_verifybackup/t/005_bad_manifest.pl
+++ b/src/bin/pg_verifybackup/t/005_bad_manifest.pl
@@ -208,7 +208,7 @@ sub test_bad_manifest
 	print $fh $manifest_contents;
 	close($fh);
 
-	command_fails_like([ 'pg_verifybackup', $tempdir ], $regexp, $test_name);
+	command_fails_like([ 'pg_verifybackup', '-Fp', $tempdir ], $regexp, $test_name);
 	return;
 }
 
diff --git a/src/bin/pg_verifybackup/t/008_untar.pl b/src/bin/pg_verifybackup/t/008_untar.pl
index 7a09f3b75b2..9896560adc3 100644
--- a/src/bin/pg_verifybackup/t/008_untar.pl
+++ b/src/bin/pg_verifybackup/t/008_untar.pl
@@ -16,6 +16,20 @@ my $primary = PostgreSQL::Test::Cluster->new('primary');
 $primary->init(allows_streaming => 1);
 $primary->start;
 
+# Create a couple of directories to use as tablespaces.
+my $TS1_LOCATION = $primary->backup_dir .'/ts1';
+$TS1_LOCATION =~ s/\/\.\//\//g;    # collapse foo/./bar to foo/bar
+mkdir($TS1_LOCATION);
+
+# Create a tablespace with table in it.
+$primary->safe_psql('postgres', qq(
+		CREATE TABLESPACE regress_ts1 LOCATION '$TS1_LOCATION';
+		SELECT oid FROM pg_tablespace WHERE spcname = 'regress_ts1';
+		CREATE TABLE regress_tbl1(i int) TABLESPACE regress_ts1;
+		INSERT INTO regress_tbl1 VALUES(generate_series(1,5));));
+my $tsoid = $primary->safe_psql('postgres', qq(
+		SELECT oid FROM pg_tablespace WHERE spcname = 'regress_ts1'));
+
 my $backup_path = $primary->backup_dir . '/server-backup';
 my $extract_path = $primary->backup_dir . '/extracted-backup';
 
@@ -23,39 +37,31 @@ my @test_configuration = (
 	{
 		'compression_method' => 'none',
 		'backup_flags' => [],
-		'backup_archive' => 'base.tar',
+		'backup_archive' => ['base.tar', "$tsoid.tar"],
 		'enabled' => 1
 	},
 	{
 		'compression_method' => 'gzip',
 		'backup_flags' => [ '--compress', 'server-gzip' ],
-		'backup_archive' => 'base.tar.gz',
-		'decompress_program' => $ENV{'GZIP_PROGRAM'},
-		'decompress_flags' => ['-d'],
+		'backup_archive' => [ 'base.tar.gz', "$tsoid.tar.gz" ],
 		'enabled' => check_pg_config("#define HAVE_LIBZ 1")
 	},
 	{
 		'compression_method' => 'lz4',
 		'backup_flags' => [ '--compress', 'server-lz4' ],
-		'backup_archive' => 'base.tar.lz4',
-		'decompress_program' => $ENV{'LZ4'},
-		'decompress_flags' => [ '-d', '-m' ],
+		'backup_archive' => ['base.tar.lz4', "$tsoid.tar.lz4" ],
 		'enabled' => check_pg_config("#define USE_LZ4 1")
 	},
 	{
 		'compression_method' => 'zstd',
 		'backup_flags' => [ '--compress', 'server-zstd' ],
-		'backup_archive' => 'base.tar.zst',
-		'decompress_program' => $ENV{'ZSTD'},
-		'decompress_flags' => ['-d'],
+		'backup_archive' => [ 'base.tar.zst', "$tsoid.tar.zst" ],
 		'enabled' => check_pg_config("#define USE_ZSTD 1")
 	},
 	{
 		'compression_method' => 'zstd',
 		'backup_flags' => [ '--compress', 'server-zstd:level=1,long' ],
-		'backup_archive' => 'base.tar.zst',
-		'decompress_program' => $ENV{'ZSTD'},
-		'decompress_flags' => ['-d'],
+		'backup_archive' => [ 'base.tar.zst', "$tsoid.tar.zst" ],
 		'enabled' => check_pg_config("#define USE_ZSTD 1")
 	});
 
@@ -86,47 +92,16 @@ for my $tc (@test_configuration)
 		my $backup_files = join(',',
 			sort grep { $_ ne '.' && $_ ne '..' } slurp_dir($backup_path));
 		my $expected_backup_files =
-		  join(',', sort ('backup_manifest', $tc->{'backup_archive'}));
+		  join(',', sort ('backup_manifest', @{ $tc->{'backup_archive'} }));
 		is($backup_files, $expected_backup_files,
 			"found expected backup files, compression $method");
 
-		# Decompress.
-		if (exists $tc->{'decompress_program'})
-		{
-			my @decompress = ($tc->{'decompress_program'});
-			push @decompress, @{ $tc->{'decompress_flags'} }
-			  if $tc->{'decompress_flags'};
-			push @decompress, $backup_path . '/' . $tc->{'backup_archive'};
-			system_or_bail(@decompress);
-		}
-
-	  SKIP:
-		{
-			my $tar = $ENV{TAR};
-			# don't check for a working tar here, to accommodate various odd
-			# cases. If tar doesn't work the init_from_backup below will fail.
-			skip "no tar program available", 1
-			  if (!defined $tar || $tar eq '');
-
-			# Untar.
-			mkdir($extract_path);
-			system_or_bail($tar, 'xf', $backup_path . '/base.tar',
-				'-C', $extract_path);
-
-			# Verify.
-			$primary->command_ok(
-				[
-					'pg_verifybackup', '-n',
-					'-m', "$backup_path/backup_manifest",
-					'-e', $extract_path
-				],
-				"verify backup, compression $method");
-		}
+		# Verify tar backup.
+		$primary->command_ok(['pg_verifybackup', '-n', '-e', $backup_path],
+			"verify backup, compression $method");
 
 		# Cleanup.
-		unlink($backup_path . '/backup_manifest');
-		unlink($backup_path . '/base.tar');
-		unlink($backup_path . '/' . $tc->{'backup_archive'});
+		rmtree($backup_path);
 		rmtree($extract_path);
 	}
 }
diff --git a/src/bin/pg_verifybackup/t/010_client_untar.pl b/src/bin/pg_verifybackup/t/010_client_untar.pl
index 8c076d46dee..6b7d7483f6e 100644
--- a/src/bin/pg_verifybackup/t/010_client_untar.pl
+++ b/src/bin/pg_verifybackup/t/010_client_untar.pl
@@ -29,41 +29,30 @@ my @test_configuration = (
 		'compression_method' => 'gzip',
 		'backup_flags' => [ '--compress', 'client-gzip:5' ],
 		'backup_archive' => 'base.tar.gz',
-		'decompress_program' => $ENV{'GZIP_PROGRAM'},
-		'decompress_flags' => ['-d'],
 		'enabled' => check_pg_config("#define HAVE_LIBZ 1")
 	},
 	{
 		'compression_method' => 'lz4',
 		'backup_flags' => [ '--compress', 'client-lz4:5' ],
 		'backup_archive' => 'base.tar.lz4',
-		'decompress_program' => $ENV{'LZ4'},
-		'decompress_flags' => ['-d'],
-		'output_file' => 'base.tar',
 		'enabled' => check_pg_config("#define USE_LZ4 1")
 	},
 	{
 		'compression_method' => 'zstd',
 		'backup_flags' => [ '--compress', 'client-zstd:5' ],
 		'backup_archive' => 'base.tar.zst',
-		'decompress_program' => $ENV{'ZSTD'},
-		'decompress_flags' => ['-d'],
 		'enabled' => check_pg_config("#define USE_ZSTD 1")
 	},
 	{
 		'compression_method' => 'zstd',
 		'backup_flags' => [ '--compress', 'client-zstd:level=1,long' ],
 		'backup_archive' => 'base.tar.zst',
-		'decompress_program' => $ENV{'ZSTD'},
-		'decompress_flags' => ['-d'],
 		'enabled' => check_pg_config("#define USE_ZSTD 1")
 	},
 	{
 		'compression_method' => 'parallel zstd',
 		'backup_flags' => [ '--compress', 'client-zstd:workers=3' ],
 		'backup_archive' => 'base.tar.zst',
-		'decompress_program' => $ENV{'ZSTD'},
-		'decompress_flags' => ['-d'],
 		'enabled' => check_pg_config("#define USE_ZSTD 1"),
 		'possibly_unsupported' =>
 		  qr/could not set compression worker count to 3: Unsupported parameter/
@@ -118,40 +107,9 @@ for my $tc (@test_configuration)
 		is($backup_files, $expected_backup_files,
 			"found expected backup files, compression $method");
 
-		# Decompress.
-		if (exists $tc->{'decompress_program'})
-		{
-			my @decompress = ($tc->{'decompress_program'});
-			push @decompress, @{ $tc->{'decompress_flags'} }
-			  if $tc->{'decompress_flags'};
-			push @decompress, $backup_path . '/' . $tc->{'backup_archive'};
-			push @decompress, $backup_path . '/' . $tc->{'output_file'}
-			  if $tc->{'output_file'};
-			system_or_bail(@decompress);
-		}
-
-	  SKIP:
-		{
-			my $tar = $ENV{TAR};
-			# don't check for a working tar here, to accommodate various odd
-			# cases. If tar doesn't work the init_from_backup below will fail.
-			skip "no tar program available", 1
-			  if (!defined $tar || $tar eq '');
-
-			# Untar.
-			mkdir($extract_path);
-			system_or_bail($tar, 'xf', $backup_path . '/base.tar',
-				'-C', $extract_path);
-
-			# Verify.
-			$primary->command_ok(
-				[
-					'pg_verifybackup', '-n',
-					'-m', "$backup_path/backup_manifest",
-					'-e', $extract_path
-				],
-				"verify backup, compression $method");
-		}
+		# Verify tar backup.
+		$primary->command_ok( [ 'pg_verifybackup', '-n', '-e', $backup_path ],
+			"verify backup, compression $method");
 
 		# Cleanup.
 		rmtree($extract_path);
-- 
2.18.0

From 52e11acc54815e9e5db92171b3fba9522f6da683 Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Wed, 31 Jul 2024 16:22:07 +0530
Subject: [PATCH v10 11/12] pg_verifybackup: Read tar files and verify its
 contents

This patch implements TAR format backup verification.

For progress reporting support, we perform this verification in two
passes: the first pass calculates total_size, and the second pass
updates done_size as verification progresses.

For the verification, in the first pass, we call verify_tar_backup_file(),
which performs basic verification by expecting only base.tar or
<tablespaceoid>.tar files and raises an error for any other files.  It
also determines the compression type of the archive file. All this
information is stored in a newly added tarFile struct, which is
appended to a list that will be used in the second pass (by
verify_tar_content()) for the final verification. In the second pass,
the tar archives are read, decompressed, and the required verification
is carried out.

For reading and decompression, fe_utils/astreamer.h is used. For
verification, a new archive streamer has been added in
astreamer_verify.c to handle TAR member files and their contents; see
astreamer_verify_content() for details. The stack of astreamers will
be set up for each TAR file in verify_tar_content(), depending on its
compression type which is detected in the first pass.

When information about a TAR member file (i.e., ASTREAMER_MEMBER_HEADER)
is received, we first verify its entry against the backup manifest. We
then decide if further checks are needed, such as checksum
verification and control data verification (if it is a pg_control
file), once the member file contents are received. Although this
decision could be made when the contents are received, it is more
efficient to make it earlier since the member file contents are
received in multiple iterations. In short, we process
ASTREAMER_MEMBER_CONTENTS multiple times but only once for other
ASTREAMER_MEMBER_* cases. We maintain this information in the
astreamer_verify structure for each member file, which is reset when
the file ends.

Unlike in a plain backup, checksum verification here occurs in two
steps. First, as the contents are received, the checksum is computed
incrementally (see member_compute_checksum). Then, at the end of
processing the member file, the final verification is performed (see
member_verify_checksum).

Similarly, during the content receiving stage, if the file is
pg_control, the data will be copied into a local buffer (see
member_copy_control_data).  The verification will then be carried out
at the end of the member file processing (see member_verify_control_data)
---
 src/bin/pg_verifybackup/Makefile           |   4 +-
 src/bin/pg_verifybackup/astreamer_verify.c | 355 +++++++++++++++++++++
 src/bin/pg_verifybackup/meson.build        |   6 +-
 src/bin/pg_verifybackup/pg_verifybackup.c  | 283 +++++++++++++++-
 src/bin/pg_verifybackup/pg_verifybackup.h  |   6 +
 src/tools/pgindent/typedefs.list           |   2 +
 6 files changed, 649 insertions(+), 7 deletions(-)
 create mode 100644 src/bin/pg_verifybackup/astreamer_verify.c

diff --git a/src/bin/pg_verifybackup/Makefile b/src/bin/pg_verifybackup/Makefile
index 7c045f142e8..df7aaabd530 100644
--- a/src/bin/pg_verifybackup/Makefile
+++ b/src/bin/pg_verifybackup/Makefile
@@ -17,11 +17,13 @@ top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
 # We need libpq only because fe_utils does.
+override CPPFLAGS := -I$(libpq_srcdir) $(CPPFLAGS)
 LDFLAGS_INTERNAL += -L$(top_builddir)/src/fe_utils -lpgfeutils $(libpq_pgport)
 
 OBJS = \
 	$(WIN32RES) \
-	pg_verifybackup.o
+	pg_verifybackup.o \
+	astreamer_verify.o
 
 all: pg_verifybackup
 
diff --git a/src/bin/pg_verifybackup/astreamer_verify.c b/src/bin/pg_verifybackup/astreamer_verify.c
new file mode 100644
index 00000000000..28a3f976877
--- /dev/null
+++ b/src/bin/pg_verifybackup/astreamer_verify.c
@@ -0,0 +1,355 @@
+/*-------------------------------------------------------------------------
+ *
+ * astreamer_verify.c
+ *
+ * Extend fe_utils/astreamer.h archive streaming facility to verify TAR
+ * backup.
+ *
+ * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
+ *
+ * src/bin/pg_verifybackup/astreamer_verify.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres_fe.h"
+
+#include "common/logging.h"
+#include "pg_verifybackup.h"
+
+typedef struct astreamer_verify
+{
+	astreamer	base;
+	verifier_context *context;
+	char	   *archive_name;
+	Oid			tblspc_oid;
+	pg_checksum_context *checksum_ctx;
+
+	/* Hold information for a member file verification */
+	manifest_file *mfile;
+	int64		received_bytes;
+	bool		verify_checksum;
+	bool		verify_control_data;
+} astreamer_verify;
+
+static void astreamer_verify_content(astreamer *streamer,
+									 astreamer_member *member,
+									 const char *data, int len,
+									 astreamer_archive_context context);
+static void astreamer_verify_finalize(astreamer *streamer);
+static void astreamer_verify_free(astreamer *streamer);
+
+static void member_verify_header(astreamer *streamer, astreamer_member *member);
+static void member_compute_checksum(astreamer *streamer,
+									astreamer_member *member,
+									const char *data, int len);
+static void member_verify_checksum(astreamer *streamer);
+static void member_copy_control_data(astreamer *streamer,
+									 astreamer_member *member,
+									 const char *data, int len);
+static void member_verify_control_data(astreamer *streamer);
+static void member_reset_info(astreamer *streamer);
+
+static const astreamer_ops astreamer_verify_ops = {
+	.content = astreamer_verify_content,
+	.finalize = astreamer_verify_finalize,
+	.free = astreamer_verify_free
+};
+
+/*
+ * Create a astreamer that can verifies content of a TAR file.
+ */
+astreamer *
+astreamer_verify_content_new(astreamer *next, verifier_context *context,
+							 char *archive_name, Oid tblspc_oid)
+{
+	astreamer_verify *streamer;
+
+	streamer = palloc0(sizeof(astreamer_verify));
+	*((const astreamer_ops **) &streamer->base.bbs_ops) =
+		&astreamer_verify_ops;
+
+	streamer->base.bbs_next = next;
+	streamer->context = context;
+	streamer->archive_name = archive_name;
+	streamer->tblspc_oid = tblspc_oid;
+	initStringInfo(&streamer->base.bbs_buffer);
+
+	if (!context->skip_checksums)
+		streamer->checksum_ctx = pg_malloc(sizeof(pg_checksum_context));
+
+	return &streamer->base;
+}
+
+/*
+ * The main entry point of the archive streamer for verifying tar members.
+ */
+static void
+astreamer_verify_content(astreamer *streamer, astreamer_member *member,
+						 const char *data, int len,
+						 astreamer_archive_context context)
+{
+	astreamer_verify *mystreamer = (astreamer_verify *) streamer;
+
+	Assert(context != ASTREAMER_UNKNOWN);
+
+	switch (context)
+	{
+		case ASTREAMER_MEMBER_HEADER:
+
+			/*
+			 * Perform the initial check and setup verification steps.
+			 */
+			member_verify_header(streamer, member);
+			break;
+
+		case ASTREAMER_MEMBER_CONTENTS:
+
+			/*
+			 * Process the member content according to the flags set by the
+			 * member header processing routine for checksum computation and
+			 * copy control data to local buffer.
+			 */
+			if (mystreamer->verify_checksum)
+				member_compute_checksum(streamer, member, data, len);
+
+			if (mystreamer->verify_control_data)
+				member_copy_control_data(streamer, member, data, len);
+			break;
+
+		case ASTREAMER_MEMBER_TRAILER:
+
+			/* Do the final checksum verification. */
+			if (mystreamer->verify_checksum)
+				member_verify_checksum(streamer);
+
+			/* Do the control data verification */
+			if (mystreamer->verify_control_data)
+				member_verify_control_data(streamer);
+
+			/*
+			 * Reset the temporary information stored for the verification.
+			 */
+			member_reset_info(streamer);
+			break;
+
+		case ASTREAMER_ARCHIVE_TRAILER:
+			break;
+
+		default:
+			/* Shouldn't happen. */
+			pg_fatal("unexpected state while parsing tar archive");
+	}
+}
+
+/*
+ * End-of-stream processing for a astreamer_verify stream.
+ */
+static void
+astreamer_verify_finalize(astreamer *streamer)
+{
+	Assert(streamer->bbs_next == NULL);
+}
+
+/*
+ * Free memory associated with a astreamer_verify stream.
+ */
+static void
+astreamer_verify_free(astreamer *streamer)
+{
+	astreamer_verify *mystreamer = (astreamer_verify *) streamer;
+
+	if (mystreamer->checksum_ctx)
+		pfree(mystreamer->checksum_ctx);
+
+	pfree(streamer->bbs_buffer.data);
+	pfree(streamer);
+}
+
+/*
+ * Verifies the tar member entry if it corresponds to a file in the backup
+ * manifest. If the archive being processed is a tablespace, prepares the
+ * required file path for subsequent operations. Finally, determines if
+ * checksum verification and control data verification need to be performed
+ * during file content processing
+ */
+static void
+member_verify_header(astreamer *streamer, astreamer_member *member)
+{
+	astreamer_verify *mystreamer = (astreamer_verify *) streamer;
+	manifest_file *m;
+
+	/* We are only interested in files that are not in the ignore list. */
+	if (member->is_directory || member->is_link ||
+		should_ignore_relpath(mystreamer->context, member->pathname))
+		return;
+
+	/*
+	 * The backup_manifest stores a relative path to the base directory for
+	 * files belong tablespace, whereas <tablespaceoid>.tar doesn't. Prepare
+	 * the required path, otherwise, the manfiest entry verification will
+	 * fail.
+	 */
+	if (OidIsValid(mystreamer->tblspc_oid))
+	{
+		char		temp[MAXPGPATH];
+
+		/* Copy original name at temporary space */
+		memcpy(temp, member->pathname, MAXPGPATH);
+
+		snprintf(member->pathname, MAXPGPATH, "%s/%d/%s",
+				 "pg_tblspc", mystreamer->tblspc_oid, temp);
+	}
+
+	/* Check the manifest entry */
+	m = verify_manifest_entry(mystreamer->context, member->pathname,
+							  member->size);
+	mystreamer->mfile = (void *) m;
+
+	/*
+	 * Prepare for checksum and control data verification.
+	 *
+	 * We could have these checks while receiving contents. However, since
+	 * contents are received in multiple iterations, this would result in
+	 * these lengthy checks being performed multiple times. Instead, having a
+	 * single flag would be more efficient.
+	 */
+	mystreamer->verify_checksum =
+		(!mystreamer->context->skip_checksums && should_verify_checksum(m));
+	mystreamer->verify_control_data =
+		should_verify_control_data(mystreamer->context->manifest, m);
+
+	/* Initialize the context required for checksum verification. */
+	if (mystreamer->verify_checksum &&
+		pg_checksum_init(mystreamer->checksum_ctx, m->checksum_type) < 0)
+	{
+		report_backup_error(mystreamer->context,
+							"%s: could not initialize checksum of file \"%s\"",
+							mystreamer->archive_name, m->pathname);
+
+		/*
+		 * Checksum verification cannot be performed without proper context
+		 * initialization.
+		 */
+		mystreamer->verify_checksum = false;
+	}
+}
+
+/*
+ * Computes the checksum incrementally for the received file content.
+ *
+ * The caller should pass a correctly initialized checksum_ctx, which will be
+ * used for incremental checksum computation.
+ */
+static void
+member_compute_checksum(astreamer *streamer, astreamer_member *member,
+						const char *data, int len)
+{
+	astreamer_verify *mystreamer = (astreamer_verify *) streamer;
+	pg_checksum_context *checksum_ctx = mystreamer->checksum_ctx;
+	manifest_file *m = mystreamer->mfile;
+
+	Assert(mystreamer->verify_checksum);
+
+	/* Should have came for the right file */
+	Assert(strcmp(member->pathname, m->pathname) == 0);
+
+	/*
+	 * The checksum context should match the type noted in the backup
+	 * manifest.
+	 */
+	Assert(checksum_ctx->type == m->checksum_type);
+
+	/* Update the total count of computed checksum bytes. */
+	mystreamer->received_bytes += len;
+
+	if (pg_checksum_update(checksum_ctx, (uint8 *) data, len) < 0)
+	{
+		report_backup_error(mystreamer->context,
+							"could not update checksum of file \"%s\"",
+							m->pathname);
+		mystreamer->verify_checksum = false;
+	}
+}
+
+/*
+ * Perform the final computation and checksum verification after the entire
+ * file content has been processed.
+ */
+static void
+member_verify_checksum(astreamer *streamer)
+{
+	astreamer_verify *mystreamer = (astreamer_verify *) streamer;
+
+	Assert(mystreamer->verify_checksum);
+
+	verify_checksum(mystreamer->context, mystreamer->mfile,
+					mystreamer->checksum_ctx, mystreamer->received_bytes);
+}
+
+/*
+ * Stores the pg_control file contents into a local buffer; we need the entire
+ * control file data for verification.
+ */
+static void
+member_copy_control_data(astreamer *streamer, astreamer_member *member,
+						 const char *data, int len)
+{
+	/* Should be here only for control file */
+	Assert(strcmp(member->pathname, "global/pg_control") == 0);
+	Assert(((astreamer_verify *) streamer)->verify_control_data);
+
+	/* Copy enough control file data needed for verification. */
+	astreamer_buffer_until(streamer, &data, &len, sizeof(ControlFileData));
+}
+
+/*
+ * Performs the CRC calculation of pg_control data and then calls the routines
+ * that execute the final verification of the control file information.
+ */
+static void
+member_verify_control_data(astreamer *streamer)
+{
+	astreamer_verify *mystreamer = (astreamer_verify *) streamer;
+	manifest_data *manifest = mystreamer->context->manifest;
+	ControlFileData control_file;
+	pg_crc32c	crc;
+	bool		crc_ok;
+
+	/* Should be here only for control file */
+	Assert(strcmp(mystreamer->mfile->pathname, "global/pg_control") == 0);
+	Assert(mystreamer->verify_control_data);
+
+	/* Should have enough control file data needed for verification. */
+	if (streamer->bbs_buffer.len != sizeof(ControlFileData))
+		report_fatal_error("%s: unexpected control file size: %d, should be %zu",
+						   mystreamer->archive_name, streamer->bbs_buffer.len,
+						   sizeof(ControlFileData));
+
+	memcpy(&control_file, streamer->bbs_buffer.data, sizeof(ControlFileData));
+
+	/* Check the CRC. */
+	INIT_CRC32C(crc);
+	COMP_CRC32C(crc, (char *) (&control_file), offsetof(ControlFileData, crc));
+	FIN_CRC32C(crc);
+
+	crc_ok = EQ_CRC32C(crc, control_file.crc);
+
+	/* Do the final control data verification. */
+	verify_control_data(&control_file, mystreamer->mfile->pathname, crc_ok,
+						manifest->system_identifier);
+}
+
+/*
+ * Reset flags and free memory allocations for member file verification.
+ */
+static void
+member_reset_info(astreamer *streamer)
+{
+	astreamer_verify *mystreamer = (astreamer_verify *) streamer;
+
+	mystreamer->mfile = NULL;
+	mystreamer->received_bytes = 0;
+	mystreamer->verify_checksum = false;
+	mystreamer->verify_control_data = false;
+}
diff --git a/src/bin/pg_verifybackup/meson.build b/src/bin/pg_verifybackup/meson.build
index 7c7d31a0350..1e3fcf7ee5a 100644
--- a/src/bin/pg_verifybackup/meson.build
+++ b/src/bin/pg_verifybackup/meson.build
@@ -1,7 +1,8 @@
 # Copyright (c) 2022-2024, PostgreSQL Global Development Group
 
 pg_verifybackup_sources = files(
-  'pg_verifybackup.c'
+  'pg_verifybackup.c',
+  'astreamer_verify.c'
 )
 
 if host_system == 'windows'
@@ -10,9 +11,10 @@ if host_system == 'windows'
     '--FILEDESC', 'pg_verifybackup - verify a backup against using a backup manifest'])
 endif
 
+pg_verifybackup_deps = [frontend_code, libpq, lz4, zlib, zstd]
 pg_verifybackup = executable('pg_verifybackup',
   pg_verifybackup_sources,
-  dependencies: [frontend_code, libpq],
+  dependencies: pg_verifybackup_deps,
   kwargs: default_bin_args,
 )
 bin_targets += pg_verifybackup
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index 3dcb174e0a9..aed1adef4e7 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -20,9 +20,12 @@
 
 #include "common/compression.h"
 #include "common/parse_manifest.h"
+#include "common/relpath.h"
 #include "fe_utils/simple_list.h"
 #include "getopt_long.h"
+#include "limits.h"
 #include "pg_verifybackup.h"
+#include "pgtar.h"
 #include "pgtime.h"
 
 /*
@@ -39,6 +42,16 @@
  */
 #define ESTIMATED_BYTES_PER_MANIFEST_LINE	100
 
+/*
+ * Tar archive information needed for content verification.
+ */
+typedef struct tarFile
+{
+	char	   *relpath;
+	Oid			tblspc_oid;
+	pg_compress_algorithm compress_algorithm;
+} tarFile;
+
 static manifest_data *parse_manifest_file(char *manifest_path);
 static void verifybackup_version_cb(JsonManifestParseContext *context,
 									int manifest_version);
@@ -60,8 +73,14 @@ static void report_manifest_error(JsonManifestParseContext *context,
 static char find_backup_format(verifier_context *context);
 static void verify_backup_directory(verifier_context *context,
 									char *relpath, char *fullpath);
-static void verify_plain_backup_file(verifier_context *context,
-									 char *relpath, char *fullpath);
+static void verify_plain_backup_file(verifier_context *context, char *relpath,
+									 char *fullpath);
+static void verify_tar_backup_file(verifier_context *context, char *relpath,
+								   char *fullpath, SimplePtrList *tarFiles);
+static void verify_tar_file_contents(verifier_context *context,
+									 SimplePtrList *tarFiles);
+static void verify_tar_contents(verifier_context *context, char *relpath,
+								char *fullpath, astreamer *streamer);
 static void report_extra_backup_files(verifier_context *context);
 static void verify_backup_checksums(verifier_context *context);
 static void verify_file_checksum(verifier_context *context,
@@ -70,6 +89,10 @@ static void verify_file_checksum(verifier_context *context,
 static void parse_required_wal(verifier_context *context,
 							   char *pg_waldump_path,
 							   char *wal_directory);
+static astreamer *create_archive_verifier(verifier_context *context,
+										  char *archive_name,
+										  Oid tblspc_oid,
+										  pg_compress_algorithm compress_algo);
 
 static void progress_report(bool finished);
 static void usage(void);
@@ -148,6 +171,10 @@ main(int argc, char **argv)
 	 */
 	simple_string_list_append(&context.ignore_list, "backup_manifest");
 	simple_string_list_append(&context.ignore_list, "pg_wal");
+	simple_string_list_append(&context.ignore_list, "pg_wal.tar");
+	simple_string_list_append(&context.ignore_list, "pg_wal.tar.gz");
+	simple_string_list_append(&context.ignore_list, "pg_wal.tar.lz4");
+	simple_string_list_append(&context.ignore_list, "pg_wal.tar.zst");
 	simple_string_list_append(&context.ignore_list, "postgresql.auto.conf");
 	simple_string_list_append(&context.ignore_list, "recovery.signal");
 	simple_string_list_append(&context.ignore_list, "standby.signal");
@@ -556,6 +583,7 @@ verify_backup_directory(verifier_context *context, char *relpath,
 {
 	DIR		   *dir;
 	struct dirent *dirent;
+	SimplePtrList tarFiles = {NULL, NULL};
 
 	dir = opendir(fullpath);
 	if (dir == NULL)
@@ -595,12 +623,23 @@ verify_backup_directory(verifier_context *context, char *relpath,
 			newrelpath = psprintf("%s/%s", relpath, filename);
 
 		if (!should_ignore_relpath(context, newrelpath))
-			verify_plain_backup_file(context, newrelpath, newfullpath);
+		{
+			if (context->format == 'p')
+				verify_plain_backup_file(context, newrelpath, newfullpath);
+			else
+				verify_tar_backup_file(context, newrelpath, newfullpath,
+									   &tarFiles);
+		}
 
 		pfree(newfullpath);
 		pfree(newrelpath);
 	}
 
+	/* Perform the final verification of the tar contents, if any. */
+	Assert(tarFiles.head == NULL || context->format == 't');
+	if (tarFiles.head != NULL)
+		verify_tar_file_contents(context, &tarFiles);
+
 	if (closedir(dir))
 	{
 		report_backup_error(context,
@@ -610,7 +649,8 @@ verify_backup_directory(verifier_context *context, char *relpath,
 }
 
 /*
- * Verify one file (which might actually be a directory or a symlink).
+ * Verify one file (which might actually be a directory, a symlink or a
+ * archive).
  *
  * The arguments to this function have the same meaning as the arguments to
  * verify_backup_directory.
@@ -677,6 +717,205 @@ verify_plain_backup_file(verifier_context *context, char *relpath,
 		total_size += m->size;
 }
 
+/*
+ * Verify one tar archive file.
+ *
+ * This function does not perform a complete verification; it only carries out
+ * basic validation of the tar format backup file, detects the compression
+ * type, and appends that information to the tarFiles list. An error will be
+ * reported if the tar archive is inaccessible, or if the file type, name, or
+ * compression type is not as expected.
+ *
+ * The arguments to this function are mostly the same as the
+ * verify_plain_backup_file. The additional argument is the file size for
+ * progress report.
+ */
+static void
+verify_tar_backup_file(verifier_context *context, char *relpath,
+					   char *fullpath, SimplePtrList *tarFiles)
+{
+	struct stat sb;
+	Oid			tblspc_oid = InvalidOid;
+	pg_compress_algorithm compress_algorithm;
+	tarFile    *tar_file;
+	char	   *suffix = NULL;
+
+	/* Should be tar format backup */
+	Assert(context->format == 't');
+
+	/* Get file information */
+	if (stat(fullpath, &sb) != 0)
+	{
+		report_backup_error(context,
+							"could not stat file or directory \"%s\": %m",
+							relpath);
+		return;
+	}
+
+	/* In a tar format backup, we expect only plain files. */
+	if (!S_ISREG(sb.st_mode))
+	{
+		report_backup_error(context,
+							"\"%s\" is not a plain file",
+							relpath);
+		return;
+	}
+
+	/*
+	 * We expect tar archive files of backing up the main directory and
+	 * tablespace.
+	 *
+	 * pg_basebackup writes the main data directory to an archive file named
+	 * base.tar and the tablespace directory to <tablespaceoid>.tar, followed
+	 * by a compression type extension such as .gz, .lz4, or .zst.
+	 */
+	if (strncmp("base", relpath, 4) == 0)
+		suffix = relpath + 4;
+	else
+	{
+		/* Expected a <tablespaceoid>.tar file here. */
+		int64 num = strtoi64(relpath, &suffix, 10);
+
+		/*
+		 * Report an error if we didn't consume at least one character, if the
+		 * result is 0, or if the value is too large to be a valid OID.
+		 */
+		if (suffix == NULL || (num <= 0) || (num > OID_MAX))
+			report_backup_error(context,
+								"\"%s\" unexpected file in the tar format backup",
+								relpath);
+		tblspc_oid = (Oid) num;
+	}
+
+	/* Now, check the compression type of the tar file */
+	if (strcmp(suffix, ".tar") == 0)
+		compress_algorithm = PG_COMPRESSION_NONE;
+	else if (strcmp(suffix, ".tgz") == 0)
+		compress_algorithm = PG_COMPRESSION_GZIP;
+	else if (strcmp(suffix, ".tar.gz") == 0)
+		compress_algorithm = PG_COMPRESSION_GZIP;
+	else if (strcmp(suffix, ".tar.lz4") == 0)
+		compress_algorithm = PG_COMPRESSION_LZ4;
+	else if (strcmp(suffix, ".tar.zst") == 0)
+		compress_algorithm = PG_COMPRESSION_ZSTD;
+	else
+	{
+		report_backup_error(context,
+							"\"%s\" unexpected file in the tar format backup",
+							relpath);
+		return;
+	}
+
+	/*
+	 * Append the information to the list for complete verification at a later
+	 * stage.
+	 */
+	tar_file = pg_malloc(sizeof(tarFile));
+	tar_file->relpath = pstrdup(relpath);
+	tar_file->tblspc_oid = tblspc_oid;
+	tar_file->compress_algorithm = compress_algorithm;
+
+	simple_ptr_list_append(tarFiles, tar_file);
+
+	/* Update statistics for progress report, if necessary */
+	if (show_progress)
+		total_size += sb.st_size;
+}
+
+/*
+ * This is the final part of tar file verification, which prepares the archive
+ * streamer stack according to the tar file compression format for each tar
+ * archive and invokes them for reading, decompressing, and ultimately
+ * verifying the contents.
+ *
+ * The arguments to this function should be a list of valid tar archives to
+ * verify, and the allocation will be freed once the verification is complete.
+ */
+static void
+verify_tar_file_contents(verifier_context *context, SimplePtrList *tarFiles)
+{
+	SimplePtrListCell *cell;
+
+	progress_report(false);
+
+	for (cell = tarFiles->head; cell != NULL; cell = cell->next)
+	{
+		tarFile    *tar_file = (tarFile *) cell->ptr;
+		astreamer  *streamer;
+		char	   *fullpath;
+
+		/* Prepare archive streamer stack */
+		streamer = create_archive_verifier(context,
+										   tar_file->relpath,
+										   tar_file->tblspc_oid,
+										   tar_file->compress_algorithm);
+
+		/* Compute the full pathname to the target file. */
+		fullpath = psprintf("%s/%s", context->backup_directory,
+							tar_file->relpath);
+
+		/* Invoke the streamer for reading, decompressing, and verifying. */
+		verify_tar_contents(context, tar_file->relpath, fullpath, streamer);
+
+		/* Cleanup. */
+		pfree(tar_file->relpath);
+		pfree(tar_file);
+		pfree(fullpath);
+
+		astreamer_finalize(streamer);
+		astreamer_free(streamer);
+	}
+	simple_ptr_list_destroy(tarFiles);
+
+	progress_report(true);
+}
+
+/*
+ * Performs the actual work for tar content verification. It reads a given tar
+ * file in predefined chunks and passes it to the streamer, which initiates
+ * routines for decompression (if necessary) and then verifies each member
+ * within the tar archive.
+ */
+static void
+verify_tar_contents(verifier_context *context, char *relpath, char *fullpath,
+					astreamer *streamer)
+{
+	int			fd;
+	int			rc;
+	char	   *buffer;
+
+	pg_log_debug("reading \"%s\"", fullpath);
+
+	/* Open the target file. */
+	if ((fd = open(fullpath, O_RDONLY | PG_BINARY, 0)) < 0)
+	{
+		report_backup_error(context, "could not open file \"%s\": %m",
+							relpath);
+		return;
+	}
+
+	buffer = pg_malloc(READ_CHUNK_SIZE * sizeof(uint8));
+
+	/* Perform the reads */
+	while ((rc = read(fd, buffer, READ_CHUNK_SIZE)) > 0)
+	{
+		astreamer_content(streamer, NULL, buffer, rc, ASTREAMER_UNKNOWN);
+
+		/* Report progress */
+		done_size += rc;
+		progress_report(false);
+	}
+
+	if (rc < 0)
+		report_backup_error(context, "could not read file \"%s\": %m",
+							relpath);
+
+	/* Close the file. */
+	if (close(fd) != 0)
+		report_backup_error(context, "could not close file \"%s\": %m",
+							relpath);
+}
+
 /*
  * Verify file and its size entry in the manifest.
  */
@@ -1045,6 +1284,42 @@ find_backup_format(verifier_context *context)
 	return result;
 }
 
+/*
+ * Identifies the necessary steps for verifying the contents of the
+ * provided tar file.
+ */
+static astreamer *
+create_archive_verifier(verifier_context *context, char *archive_name,
+						Oid tblspc_oid, pg_compress_algorithm compress_algo)
+{
+	astreamer  *streamer = NULL;
+
+	/* Should be here only for tar backup */
+	Assert(context->format == 't');
+
+	/*
+	 * To verify the contents of the tar file, the initial step is to parse
+	 * its content.
+	 */
+	streamer = astreamer_verify_content_new(streamer, context, archive_name,
+											tblspc_oid);
+	streamer = astreamer_tar_parser_new(streamer);
+
+	/*
+	 * If the tar file is compressed, we must perform the appropriate
+	 * decompression operation before proceeding with the verification of its
+	 * contents.
+	 */
+	if (compress_algo == PG_COMPRESSION_GZIP)
+		streamer = astreamer_gzip_decompressor_new(streamer);
+	else if (compress_algo == PG_COMPRESSION_LZ4)
+		streamer = astreamer_lz4_decompressor_new(streamer);
+	else if (compress_algo == PG_COMPRESSION_ZSTD)
+		streamer = astreamer_zstd_decompressor_new(streamer);
+
+	return streamer;
+}
+
 /*
  * Print a progress report based on the global variables.
  *
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.h b/src/bin/pg_verifybackup/pg_verifybackup.h
index 1bba4e7ea92..963ec71e270 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.h
+++ b/src/bin/pg_verifybackup/pg_verifybackup.h
@@ -18,6 +18,7 @@
 #include "common/hashfn_unstable.h"
 #include "common/logging.h"
 #include "common/parse_manifest.h"
+#include "fe_utils/astreamer.h"
 #include "fe_utils/simple_list.h"
 
 /*
@@ -128,4 +129,9 @@ extern void report_fatal_error(const char *pg_restrict fmt,...)
 extern bool should_ignore_relpath(verifier_context *context,
 								  const char *relpath);
 
+extern astreamer *astreamer_verify_content_new(astreamer *next,
+											   verifier_context *context,
+											   char *archive_name,
+											   Oid tblspc_oid);
+
 #endif							/* PG_VERIFYBACKUP_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 547d14b3e7c..47b5f0edcc7 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -3329,6 +3329,7 @@ astreamer_plain_writer
 astreamer_recovery_injector
 astreamer_tar_archiver
 astreamer_tar_parser
+astreamer_verify
 astreamer_zstd_frame
 bgworker_main_type
 bh_node_type
@@ -3950,6 +3951,7 @@ substitute_phv_relids_context
 subxids_array_status
 symbol
 tablespaceinfo
+tarFile
 td_entry
 teSection
 temp_tablespaces_extra
-- 
2.18.0

From 1b07c42e3ea2b4a1b3b463611d93f8fa94426260 Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Tue, 2 Jul 2024 10:26:35 +0530
Subject: [PATCH v10 10/12] pg_verifybackup: Add backup format and compression
 option

----
NOTE: This patch is not meant to be committed separately. It should
be squashed with the next patch that implements tar format support.
----
---
 src/bin/pg_verifybackup/pg_verifybackup.c | 74 ++++++++++++++++++++++-
 src/bin/pg_verifybackup/pg_verifybackup.h |  1 +
 2 files changed, 73 insertions(+), 2 deletions(-)

diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index ddc6ed7471b..3dcb174e0a9 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -18,6 +18,7 @@
 #include <sys/stat.h>
 #include <time.h>
 
+#include "common/compression.h"
 #include "common/parse_manifest.h"
 #include "fe_utils/simple_list.h"
 #include "getopt_long.h"
@@ -56,6 +57,7 @@ static void report_manifest_error(JsonManifestParseContext *context,
 								  const char *fmt,...)
 			pg_attribute_printf(2, 3) pg_attribute_noreturn();
 
+static char find_backup_format(verifier_context *context);
 static void verify_backup_directory(verifier_context *context,
 									char *relpath, char *fullpath);
 static void verify_plain_backup_file(verifier_context *context,
@@ -91,6 +93,7 @@ main(int argc, char **argv)
 		{"exit-on-error", no_argument, NULL, 'e'},
 		{"ignore", required_argument, NULL, 'i'},
 		{"manifest-path", required_argument, NULL, 'm'},
+		{"format", required_argument, NULL, 'F'},
 		{"no-parse-wal", no_argument, NULL, 'n'},
 		{"progress", no_argument, NULL, 'P'},
 		{"quiet", no_argument, NULL, 'q'},
@@ -112,6 +115,7 @@ main(int argc, char **argv)
 	progname = get_progname(argv[0]);
 
 	memset(&context, 0, sizeof(context));
+	context.format = '\0';
 
 	if (argc > 1)
 	{
@@ -148,7 +152,7 @@ main(int argc, char **argv)
 	simple_string_list_append(&context.ignore_list, "recovery.signal");
 	simple_string_list_append(&context.ignore_list, "standby.signal");
 
-	while ((c = getopt_long(argc, argv, "ei:m:nPqsw:", long_options, NULL)) != -1)
+	while ((c = getopt_long(argc, argv, "eF:i:m:nPqsw:", long_options, NULL)) != -1)
 	{
 		switch (c)
 		{
@@ -167,6 +171,15 @@ main(int argc, char **argv)
 				manifest_path = pstrdup(optarg);
 				canonicalize_path(manifest_path);
 				break;
+			case 'F':
+				if (strcmp(optarg, "p") == 0 || strcmp(optarg, "plain") == 0)
+					context.format = 'p';
+				else if (strcmp(optarg, "t") == 0 || strcmp(optarg, "tar") == 0)
+					context.format = 't';
+				else
+					pg_fatal("invalid backup format \"%s\", must be \"plain\" or \"tar\"",
+							 optarg);
+				break;
 			case 'n':
 				no_parse_wal = true;
 				break;
@@ -214,11 +227,26 @@ main(int argc, char **argv)
 		pg_fatal("cannot specify both %s and %s",
 				 "-P/--progress", "-q/--quiet");
 
+	/* Determine the backup format if it hasn't been specified. */
+	if (context.format == '\0')
+		context.format = find_backup_format(&context);
+
 	/* Unless --no-parse-wal was specified, we will need pg_waldump. */
 	if (!no_parse_wal)
 	{
 		int			ret;
 
+		/*
+		 * XXX: In the future, we should consider enhancing pg_waldump to read
+		 * WAL files from the tar archive.
+		 */
+		if (context.format != 'p')
+		{
+			pg_log_error("pg_waldump does not support parsing WAL files from a tar archive.");
+			pg_log_error_hint("Try \"%s --help\" to skip parse WAL files option.", progname);
+			exit(1);
+		}
+
 		pg_waldump_path = pg_malloc(MAXPGPATH);
 		ret = find_other_exec(argv[0], "pg_waldump",
 							  "pg_waldump (PostgreSQL) " PG_VERSION "\n",
@@ -273,8 +301,13 @@ main(int argc, char **argv)
 	/*
 	 * Now do the expensive work of verifying file checksums, unless we were
 	 * told to skip it.
+	 *
+	 * We were only checking the plain backup here. For the tar backup, file
+	 * checksums verification (if requested) will be done immediately when the
+	 * file is accessed, as we don't have random access to the files like we
+	 * do with plain backups.
 	 */
-	if (!context.skip_checksums)
+	if (!context.skip_checksums && context.format == 'p')
 		verify_backup_checksums(&context);
 
 	/*
@@ -976,6 +1009,42 @@ should_ignore_relpath(verifier_context *context, const char *relpath)
 	return false;
 }
 
+/*
+ * To detect the backup format, it checks for the PG_VERSION file in the backup
+ * directory. If found, it will be considered a plain-format backup; otherwise,
+ * it will be assumed to be a tar-format backup.
+ */
+static char
+find_backup_format(verifier_context *context)
+{
+	char		result;
+	char	   *path;
+	struct stat sb;
+
+	/* Should be here only if the backup format is unknown */
+	Assert(context->format == '\0');
+
+	/* Check PG_VERSION file. */
+	path = psprintf("%s/%s", context->backup_directory, "PG_VERSION");
+	if (stat(path, &sb) == 0)
+		result = 'p';
+	else
+	{
+		if (errno != ENOENT)
+		{
+			pg_log_error("cannot determine backup format");
+			pg_log_error_hint("Try \"%s --help\" for more information.", progname);
+			exit(1);
+		}
+
+		/* Otherwise, it is assumed to be a TAR backup. */
+		result = 't';
+	}
+	pfree(path);
+
+	return result;
+}
+
 /*
  * Print a progress report based on the global variables.
  *
@@ -1033,6 +1102,7 @@ usage(void)
 	printf(_("  -e, --exit-on-error         exit immediately on error\n"));
 	printf(_("  -i, --ignore=RELATIVE_PATH  ignore indicated path\n"));
 	printf(_("  -m, --manifest-path=PATH    use specified path for manifest\n"));
+	printf(_("  -F, --format=p|t            backup format (plain, tar)\n"));
 	printf(_("  -n, --no-parse-wal          do not try to parse WAL files\n"));
 	printf(_("  -P, --progress              show progress information\n"));
 	printf(_("  -q, --quiet                 do not print any output, except for errors\n"));
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.h b/src/bin/pg_verifybackup/pg_verifybackup.h
index 56fbb731337..1bba4e7ea92 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.h
+++ b/src/bin/pg_verifybackup/pg_verifybackup.h
@@ -105,6 +105,7 @@ typedef struct verifier_context
 	manifest_data *manifest;
 	char	   *backup_directory;
 	SimpleStringList ignore_list;
+	char		format;			/* backup format:  p(lain)/t(ar) */
 	bool		skip_checksums;
 	bool		exit_on_error;
 	bool		saw_any_error;
-- 
2.18.0

From 6a5086e67647bbc90e71e469d5195b76b6ba9f4b Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Thu, 8 Aug 2024 16:01:33 +0530
Subject: [PATCH v10 09/12] Add simple_ptr_list_destroy() and
 simple_ptr_list_destroy_deep() API.

We didn't have any helper function to destroy SimplePtrList, likely
because it wasn't needed before, but it's required in a later patch in
this set.  I've added two functions for this purpose, inspired by
list_free() and list_free_deep().
---
 src/fe_utils/simple_list.c         | 39 ++++++++++++++++++++++++++++++
 src/include/fe_utils/simple_list.h |  2 ++
 2 files changed, 41 insertions(+)

diff --git a/src/fe_utils/simple_list.c b/src/fe_utils/simple_list.c
index 2d88eb54067..9d218911c31 100644
--- a/src/fe_utils/simple_list.c
+++ b/src/fe_utils/simple_list.c
@@ -173,3 +173,42 @@ simple_ptr_list_append(SimplePtrList *list, void *ptr)
 		list->head = cell;
 	list->tail = cell;
 }
+
+/*
+ * Destroy a pointer list and optionally the pointed-to element
+ */
+static void
+simple_ptr_list_destroy_private(SimplePtrList *list, bool deep)
+{
+	SimplePtrListCell *cell;
+
+	cell = list->head;
+	while (cell != NULL)
+	{
+		SimplePtrListCell *next;
+
+		next = cell->next;
+		if (deep)
+			pg_free(cell->ptr);
+		pg_free(cell);
+		cell = next;
+	}
+}
+
+/*
+ * Destroy a pointer list and the pointed-to element
+ */
+void
+simple_ptr_list_destroy_deep(SimplePtrList *list)
+{
+	simple_ptr_list_destroy_private(list, true);
+}
+
+/*
+ * Destroy only pointer list and not the pointed-to element
+ */
+void
+simple_ptr_list_destroy(SimplePtrList *list)
+{
+	simple_ptr_list_destroy_private(list, false);
+}
diff --git a/src/include/fe_utils/simple_list.h b/src/include/fe_utils/simple_list.h
index d42ecded8ed..5b7cbec8a62 100644
--- a/src/include/fe_utils/simple_list.h
+++ b/src/include/fe_utils/simple_list.h
@@ -66,5 +66,7 @@ extern void simple_string_list_destroy(SimpleStringList *list);
 extern const char *simple_string_list_not_touched(SimpleStringList *list);
 
 extern void simple_ptr_list_append(SimplePtrList *list, void *ptr);
+extern void simple_ptr_list_destroy_deep(SimplePtrList *list);
+extern void simple_ptr_list_destroy(SimplePtrList *list);
 
 #endif							/* SIMPLE_LIST_H */
-- 
2.18.0

From 0542b8d4d620a181f06a16339202928b7ec3ec8c Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Thu, 1 Aug 2024 15:47:26 +0530
Subject: [PATCH v10 08/12] Refactor: split verify_control_file.

Separated the manifest entry verification code into a new function and
introduced the should_verify_control_data() macro, similar to
should_verify_checksum().

Like verify_file_checksum(), verify_control_file() is too design to
accept the pg_control file patch which will be opened and respective
information will be verified. But, in case of tar backup we would be
having pg_control file contents instead, that needs to be verified in
the same way.  For that reason the code that doing the verification is
separated into separate function to so that can be reused for the tar
backup verification as well.
---
 src/bin/pg_verifybackup/pg_verifybackup.c | 44 +++++++++++------------
 src/bin/pg_verifybackup/pg_verifybackup.h | 15 ++++++++
 2 files changed, 35 insertions(+), 24 deletions(-)

diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index c8543cb4f7f..ddc6ed7471b 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -18,7 +18,6 @@
 #include <sys/stat.h>
 #include <time.h>
 
-#include "common/logging.h"
 #include "common/parse_manifest.h"
 #include "fe_utils/simple_list.h"
 #include "getopt_long.h"
@@ -61,8 +60,6 @@ static void verify_backup_directory(verifier_context *context,
 									char *relpath, char *fullpath);
 static void verify_plain_backup_file(verifier_context *context,
 									 char *relpath, char *fullpath);
-static void verify_control_file(const char *controlpath,
-								uint64 manifest_system_identifier);
 static void report_extra_backup_files(verifier_context *context);
 static void verify_backup_checksums(verifier_context *context);
 static void verify_file_checksum(verifier_context *context,
@@ -626,14 +623,20 @@ verify_plain_backup_file(verifier_context *context, char *relpath,
 	/* Check the backup manifest entry for this file. */
 	m = verify_manifest_entry(context, relpath, sb.st_size);
 
-	/*
-	 * Validate the manifest system identifier, not available in manifest
-	 * version 1.
-	 */
-	if (context->manifest->version != 1 &&
-		strcmp(relpath, "global/pg_control") == 0 &&
-		m != NULL && m->matched && !m->bad)
-		verify_control_file(fullpath, context->manifest->system_identifier);
+	/* Validate the manifest system identifier */
+	if (should_verify_control_data(context->manifest, m))
+	{
+		ControlFileData *control_file;
+		bool		crc_ok;
+
+		pg_log_debug("reading \"%s\"", fullpath);
+		control_file = get_controlfile_by_exact_path(fullpath, &crc_ok);
+
+		verify_control_data(control_file, fullpath, crc_ok,
+							context->manifest->system_identifier);
+		/* Release memory. */
+		pfree(control_file);
+	}
 
 	/* Update statistics for progress report, if necessary */
 	if (show_progress && !context->skip_checksums &&
@@ -682,18 +685,14 @@ verify_manifest_entry(verifier_context *context, char *relpath, int64 filesize)
 }
 
 /*
- * Sanity check control file and validate system identifier against manifest
- * system identifier.
+ * Sanity check control file data and validate system identifier against
+ * manifest system identifier.
  */
-static void
-verify_control_file(const char *controlpath, uint64 manifest_system_identifier)
+void
+verify_control_data(ControlFileData *control_file,
+					const char *controlpath, bool crc_ok,
+					uint64 manifest_system_identifier)
 {
-	ControlFileData *control_file;
-	bool		crc_ok;
-
-	pg_log_debug("reading \"%s\"", controlpath);
-	control_file = get_controlfile_by_exact_path(controlpath, &crc_ok);
-
 	/* Control file contents not meaningful if CRC is bad. */
 	if (!crc_ok)
 		report_fatal_error("%s: CRC is incorrect", controlpath);
@@ -709,9 +708,6 @@ verify_control_file(const char *controlpath, uint64 manifest_system_identifier)
 						   controlpath,
 						   (unsigned long long) manifest_system_identifier,
 						   (unsigned long long) control_file->system_identifier);
-
-	/* Release memory. */
-	pfree(control_file);
 }
 
 /*
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.h b/src/bin/pg_verifybackup/pg_verifybackup.h
index 93859d9d541..56fbb731337 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.h
+++ b/src/bin/pg_verifybackup/pg_verifybackup.h
@@ -16,6 +16,7 @@
 
 #include "common/controldata_utils.h"
 #include "common/hashfn_unstable.h"
+#include "common/logging.h"
 #include "common/parse_manifest.h"
 #include "fe_utils/simple_list.h"
 
@@ -44,6 +45,17 @@ typedef struct manifest_file
 	(((m) != NULL) && ((m)->matched) && !((m)->bad) && \
 	 (((m)->checksum_type) != CHECKSUM_TYPE_NONE))
 
+/*
+ * Validate the manifest system identifier against the control file; this
+ * feature is not available in manifest version 1.  This validation should be
+ * carried out only if the manifest entry validation is completed without any
+ * errors.
+ */
+#define should_verify_control_data(manifest, m) \
+	(((manifest)->version != 1) && \
+	 ((m) != NULL) && ((m)->matched) && !((m)->bad) && \
+	 (strcmp((m)->pathname, "global/pg_control") == 0))
+
 /*
  * Define a hash table which we can use to store information about the files
  * mentioned in the backup manifest.
@@ -103,6 +115,9 @@ extern manifest_file *verify_manifest_entry(verifier_context *context,
 extern void verify_checksum(verifier_context *context, manifest_file *m,
 							pg_checksum_context *checksum_ctx,
 							int64 bytes_read);
+extern void verify_control_data(ControlFileData *control_file,
+								const char *controlpath, bool crc_ok,
+								uint64 manifest_system_identifier);
 
 extern void report_backup_error(verifier_context *context,
 								const char *pg_restrict fmt,...)
-- 
2.18.0

From 6c9cb1c2725b9a60c5e549808555b15b06ea44e9 Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Wed, 14 Aug 2024 15:14:15 +0530
Subject: [PATCH v10 07/12] Refactor: split verify_file_checksum() function.

Move the core functionality of verify_file_checksum to a new function
to reuse it instead of duplicating the code.

The verify_file_checksum() function is designed to take a file path,
open and read the file contents, and then calculate the checksum.
However, for TAR backups, instead of a file path, we receive the file
content in chunks, and the checksum needs to be calculated
incrementally. While the initial operations for plain and TAR backup
checksum calculations differ, the final checks and error handling are
the same. By moving the shared logic to a separate function, we can
reuse the code for both types of backups.
---
 src/bin/pg_verifybackup/pg_verifybackup.c | 20 +++++++++++++++++---
 src/bin/pg_verifybackup/pg_verifybackup.h |  3 +++
 2 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index f85dfefab81..c8543cb4f7f 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -787,8 +787,6 @@ verify_file_checksum(verifier_context *context, manifest_file *m,
 	int			fd;
 	int			rc;
 	size_t		bytes_read = 0;
-	uint8		checksumbuf[PG_CHECKSUM_MAX_LENGTH];
-	int			checksumlen;
 
 	/* Open the target file. */
 	if ((fd = open(fullpath, O_RDONLY | PG_BINARY, 0)) < 0)
@@ -839,6 +837,22 @@ verify_file_checksum(verifier_context *context, manifest_file *m,
 	if (rc < 0)
 		return;
 
+	/* Do the final computation and verification. */
+	verify_checksum(context, m, &checksum_ctx, bytes_read);
+}
+
+/*
+ * A helper function to finalize checksum computation and verify it against the
+ * backup manifest information.
+ */
+void
+verify_checksum(verifier_context *context, manifest_file *m,
+				pg_checksum_context *checksum_ctx, int64 bytes_read)
+{
+	const char *relpath = m->pathname;
+	int			checksumlen;
+	uint8		checksumbuf[PG_CHECKSUM_MAX_LENGTH];
+
 	/*
 	 * Double-check that we read the expected number of bytes from the file.
 	 * Normally, a file size mismatch would be caught in verify_manifest_entry
@@ -855,7 +869,7 @@ verify_file_checksum(verifier_context *context, manifest_file *m,
 	}
 
 	/* Get the final checksum. */
-	checksumlen = pg_checksum_final(&checksum_ctx, checksumbuf);
+	checksumlen = pg_checksum_final(checksum_ctx, checksumbuf);
 	if (checksumlen < 0)
 	{
 		report_backup_error(context,
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.h b/src/bin/pg_verifybackup/pg_verifybackup.h
index 2e71f14669b..93859d9d541 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.h
+++ b/src/bin/pg_verifybackup/pg_verifybackup.h
@@ -100,6 +100,9 @@ typedef struct verifier_context
 
 extern manifest_file *verify_manifest_entry(verifier_context *context,
 											char *relpath, int64 filesize);
+extern void verify_checksum(verifier_context *context, manifest_file *m,
+							pg_checksum_context *checksum_ctx,
+							int64 bytes_read);
 
 extern void report_backup_error(verifier_context *context,
 								const char *pg_restrict fmt,...)
-- 
2.18.0

From deba61f678fb652056d51889103e547d016dc6da Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Wed, 14 Aug 2024 10:42:37 +0530
Subject: [PATCH v10 06/12] Refactor: split verify_backup_file() function and
 rename it.

The function verify_backup_file() has now been renamed to
verify_plain_backup_file() to make it clearer that it is specifically
used for verifying files in a plain backup. Similarly, in a future
patch, we would have a verify_tar_backup_file() function for
verifying TAR backup files.

In addition to that, moved the manifest entry verification code into a
new function called verify_manifest_entry() so that it can be reused
for tar backup verification. If verify_manifest_entry() doesn't find
an entry, it reports an error as before and returns NULL to the
caller. This is why a NULL check is added to should_verify_checksum().
---
 src/bin/pg_verifybackup/pg_verifybackup.c | 58 +++++++++++++++--------
 src/bin/pg_verifybackup/pg_verifybackup.h |  6 ++-
 2 files changed, 42 insertions(+), 22 deletions(-)

diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index 384a4dd3500..f85dfefab81 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -59,8 +59,8 @@ static void report_manifest_error(JsonManifestParseContext *context,
 
 static void verify_backup_directory(verifier_context *context,
 									char *relpath, char *fullpath);
-static void verify_backup_file(verifier_context *context,
-							   char *relpath, char *fullpath);
+static void verify_plain_backup_file(verifier_context *context,
+									 char *relpath, char *fullpath);
 static void verify_control_file(const char *controlpath,
 								uint64 manifest_system_identifier);
 static void report_extra_backup_files(verifier_context *context);
@@ -565,7 +565,7 @@ verify_backup_directory(verifier_context *context, char *relpath,
 			newrelpath = psprintf("%s/%s", relpath, filename);
 
 		if (!should_ignore_relpath(context, newrelpath))
-			verify_backup_file(context, newrelpath, newfullpath);
+			verify_plain_backup_file(context, newrelpath, newfullpath);
 
 		pfree(newfullpath);
 		pfree(newrelpath);
@@ -586,7 +586,8 @@ verify_backup_directory(verifier_context *context, char *relpath,
  * verify_backup_directory.
  */
 static void
-verify_backup_file(verifier_context *context, char *relpath, char *fullpath)
+verify_plain_backup_file(verifier_context *context, char *relpath,
+						 char *fullpath)
 {
 	struct stat sb;
 	manifest_file *m;
@@ -622,6 +623,32 @@ verify_backup_file(verifier_context *context, char *relpath, char *fullpath)
 		return;
 	}
 
+	/* Check the backup manifest entry for this file. */
+	m = verify_manifest_entry(context, relpath, sb.st_size);
+
+	/*
+	 * Validate the manifest system identifier, not available in manifest
+	 * version 1.
+	 */
+	if (context->manifest->version != 1 &&
+		strcmp(relpath, "global/pg_control") == 0 &&
+		m != NULL && m->matched && !m->bad)
+		verify_control_file(fullpath, context->manifest->system_identifier);
+
+	/* Update statistics for progress report, if necessary */
+	if (show_progress && !context->skip_checksums &&
+		should_verify_checksum(m))
+		total_size += m->size;
+}
+
+/*
+ * Verify file and its size entry in the manifest.
+ */
+manifest_file *
+verify_manifest_entry(verifier_context *context, char *relpath, int64 filesize)
+{
+	manifest_file *m;
+
 	/* Check whether there's an entry in the manifest hash. */
 	m = manifest_files_lookup(context->manifest->files, relpath);
 	if (m == NULL)
@@ -629,40 +656,29 @@ verify_backup_file(verifier_context *context, char *relpath, char *fullpath)
 		report_backup_error(context,
 							"\"%s\" is present on disk but not in the manifest",
 							relpath);
-		return;
+		return NULL;
 	}
 
 	/* Flag this entry as having been encountered in the filesystem. */
 	m->matched = true;
 
 	/* Check that the size matches. */
-	if (m->size != sb.st_size)
+	if (m->size != filesize)
 	{
 		report_backup_error(context,
 							"\"%s\" has size %lld on disk but size %zu in the manifest",
-							relpath, (long long int) sb.st_size, m->size);
+							relpath, (long long int) filesize, m->size);
 		m->bad = true;
 	}
 
-	/*
-	 * Validate the manifest system identifier, not available in manifest
-	 * version 1.
-	 */
-	if (context->manifest->version != 1 &&
-		strcmp(relpath, "global/pg_control") == 0)
-		verify_control_file(fullpath, context->manifest->system_identifier);
-
-	/* Update statistics for progress report, if necessary */
-	if (show_progress && !context->skip_checksums &&
-		should_verify_checksum(m))
-		total_size += m->size;
-
 	/*
 	 * We don't verify checksums at this stage. We first finish verifying that
 	 * we have the expected set of files with the expected sizes, and only
 	 * afterwards verify the checksums. That's because computing checksums may
 	 * take a while, and we'd like to report more obvious problems quickly.
 	 */
+
+	return m;
 }
 
 /*
@@ -825,7 +841,7 @@ verify_file_checksum(verifier_context *context, manifest_file *m,
 
 	/*
 	 * Double-check that we read the expected number of bytes from the file.
-	 * Normally, a file size mismatch would be caught in verify_backup_file
+	 * Normally, a file size mismatch would be caught in verify_manifest_entry
 	 * and this check would never be reached, but this provides additional
 	 * safety and clarity in the event of concurrent modifications or
 	 * filesystem misbehavior.
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.h b/src/bin/pg_verifybackup/pg_verifybackup.h
index bd9c95c477a..2e71f14669b 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.h
+++ b/src/bin/pg_verifybackup/pg_verifybackup.h
@@ -41,7 +41,8 @@ typedef struct manifest_file
 } manifest_file;
 
 #define should_verify_checksum(m) \
-	(((m)->matched) && !((m)->bad) && (((m)->checksum_type) != CHECKSUM_TYPE_NONE))
+	(((m) != NULL) && ((m)->matched) && !((m)->bad) && \
+	 (((m)->checksum_type) != CHECKSUM_TYPE_NONE))
 
 /*
  * Define a hash table which we can use to store information about the files
@@ -97,6 +98,9 @@ typedef struct verifier_context
 	bool		saw_any_error;
 } verifier_context;
 
+extern manifest_file *verify_manifest_entry(verifier_context *context,
+											char *relpath, int64 filesize);
+
 extern void report_backup_error(verifier_context *context,
 								const char *pg_restrict fmt,...)
 			pg_attribute_printf(2, 3);
-- 
2.18.0

From dea8e0cd58a41c0912a5dff3ed32ada80ce0034b Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Thu, 1 Aug 2024 12:10:34 +0530
Subject: [PATCH v10 05/12] Refactor: move some part of pg_verifybackup.c to
 pg_verifybackup.h

---
 src/bin/pg_verifybackup/pg_verifybackup.c |  94 +------------------
 src/bin/pg_verifybackup/pg_verifybackup.h | 108 ++++++++++++++++++++++
 2 files changed, 112 insertions(+), 90 deletions(-)
 create mode 100644 src/bin/pg_verifybackup/pg_verifybackup.h

diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index c6d01d52335..384a4dd3500 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -18,12 +18,11 @@
 #include <sys/stat.h>
 #include <time.h>
 
-#include "common/controldata_utils.h"
-#include "common/hashfn_unstable.h"
 #include "common/logging.h"
 #include "common/parse_manifest.h"
 #include "fe_utils/simple_list.h"
 #include "getopt_long.h"
+#include "pg_verifybackup.h"
 #include "pgtime.h"
 
 /*
@@ -40,84 +39,6 @@
  */
 #define ESTIMATED_BYTES_PER_MANIFEST_LINE	100
 
-/*
- * How many bytes should we try to read from a file at once?
- */
-#define READ_CHUNK_SIZE				(128 * 1024)
-
-/*
- * Each file described by the manifest file is parsed to produce an object
- * like this.
- */
-typedef struct manifest_file
-{
-	uint32		status;			/* hash status */
-	const char *pathname;
-	size_t		size;
-	pg_checksum_type checksum_type;
-	int			checksum_length;
-	uint8	   *checksum_payload;
-	bool		matched;
-	bool		bad;
-} manifest_file;
-
-#define should_verify_checksum(m) \
-	(((m)->matched) && !((m)->bad) && (((m)->checksum_type) != CHECKSUM_TYPE_NONE))
-
-/*
- * Define a hash table which we can use to store information about the files
- * mentioned in the backup manifest.
- */
-#define SH_PREFIX		manifest_files
-#define SH_ELEMENT_TYPE	manifest_file
-#define SH_KEY_TYPE		const char *
-#define	SH_KEY			pathname
-#define SH_HASH_KEY(tb, key)	hash_string(key)
-#define SH_EQUAL(tb, a, b)		(strcmp(a, b) == 0)
-#define	SH_SCOPE		static inline
-#define SH_RAW_ALLOCATOR	pg_malloc0
-#define SH_DECLARE
-#define SH_DEFINE
-#include "lib/simplehash.h"
-
-/*
- * Each WAL range described by the manifest file is parsed to produce an
- * object like this.
- */
-typedef struct manifest_wal_range
-{
-	TimeLineID	tli;
-	XLogRecPtr	start_lsn;
-	XLogRecPtr	end_lsn;
-	struct manifest_wal_range *next;
-	struct manifest_wal_range *prev;
-} manifest_wal_range;
-
-/*
- * All the data parsed from a backup_manifest file.
- */
-typedef struct manifest_data
-{
-	int			version;
-	uint64		system_identifier;
-	manifest_files_hash *files;
-	manifest_wal_range *first_wal_range;
-	manifest_wal_range *last_wal_range;
-} manifest_data;
-
-/*
- * All of the context information we need while checking a backup manifest.
- */
-typedef struct verifier_context
-{
-	manifest_data *manifest;
-	char	   *backup_directory;
-	SimpleStringList ignore_list;
-	bool		skip_checksums;
-	bool		exit_on_error;
-	bool		saw_any_error;
-} verifier_context;
-
 static manifest_data *parse_manifest_file(char *manifest_path);
 static void verifybackup_version_cb(JsonManifestParseContext *context,
 									int manifest_version);
@@ -151,13 +72,6 @@ static void parse_required_wal(verifier_context *context,
 							   char *pg_waldump_path,
 							   char *wal_directory);
 
-static void report_backup_error(verifier_context *context,
-								const char *pg_restrict fmt,...)
-			pg_attribute_printf(2, 3);
-static void report_fatal_error(const char *pg_restrict fmt,...)
-			pg_attribute_printf(1, 2) pg_attribute_noreturn();
-static bool should_ignore_relpath(verifier_context *context, const char *relpath);
-
 static void progress_report(bool finished);
 static void usage(void);
 
@@ -980,7 +894,7 @@ parse_required_wal(verifier_context *context, char *pg_waldump_path,
  * Update the context to indicate that we saw an error, and exit if the
  * context says we should.
  */
-static void
+void
 report_backup_error(verifier_context *context, const char *pg_restrict fmt,...)
 {
 	va_list		ap;
@@ -997,7 +911,7 @@ report_backup_error(verifier_context *context, const char *pg_restrict fmt,...)
 /*
  * Report a fatal error and exit
  */
-static void
+void
 report_fatal_error(const char *pg_restrict fmt,...)
 {
 	va_list		ap;
@@ -1016,7 +930,7 @@ report_fatal_error(const char *pg_restrict fmt,...)
  * Note that by "prefix" we mean a parent directory; for this purpose,
  * "aa/bb" is not a prefix of "aa/bbb", but it is a prefix of "aa/bb/cc".
  */
-static bool
+bool
 should_ignore_relpath(verifier_context *context, const char *relpath)
 {
 	SimpleStringListCell *cell;
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.h b/src/bin/pg_verifybackup/pg_verifybackup.h
new file mode 100644
index 00000000000..bd9c95c477a
--- /dev/null
+++ b/src/bin/pg_verifybackup/pg_verifybackup.h
@@ -0,0 +1,108 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_verifybackup.h
+ *	  Verify a backup against a backup manifest.
+ *
+ * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/bin/pg_verifybackup/pg_verifybackup.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef PG_VERIFYBACKUP_H
+#define PG_VERIFYBACKUP_H
+
+#include "common/controldata_utils.h"
+#include "common/hashfn_unstable.h"
+#include "common/parse_manifest.h"
+#include "fe_utils/simple_list.h"
+
+/*
+ * How many bytes should we try to read from a file at once?
+ */
+#define READ_CHUNK_SIZE				(128 * 1024)
+
+/*
+ * Each file described by the manifest file is parsed to produce an object
+ * like this.
+ */
+typedef struct manifest_file
+{
+	uint32		status;			/* hash status */
+	const char *pathname;
+	size_t		size;
+	pg_checksum_type checksum_type;
+	int			checksum_length;
+	uint8	   *checksum_payload;
+	bool		matched;
+	bool		bad;
+} manifest_file;
+
+#define should_verify_checksum(m) \
+	(((m)->matched) && !((m)->bad) && (((m)->checksum_type) != CHECKSUM_TYPE_NONE))
+
+/*
+ * Define a hash table which we can use to store information about the files
+ * mentioned in the backup manifest.
+ */
+#define SH_PREFIX		manifest_files
+#define SH_ELEMENT_TYPE	manifest_file
+#define SH_KEY_TYPE		const char *
+#define	SH_KEY			pathname
+#define SH_HASH_KEY(tb, key)	hash_string(key)
+#define SH_EQUAL(tb, a, b)		(strcmp(a, b) == 0)
+#define	SH_SCOPE		static inline
+#define SH_RAW_ALLOCATOR	pg_malloc0
+#define SH_DECLARE
+#define SH_DEFINE
+#include "lib/simplehash.h"
+
+/*
+ * Each WAL range described by the manifest file is parsed to produce an
+ * object like this.
+ */
+typedef struct manifest_wal_range
+{
+	TimeLineID	tli;
+	XLogRecPtr	start_lsn;
+	XLogRecPtr	end_lsn;
+	struct manifest_wal_range *next;
+	struct manifest_wal_range *prev;
+} manifest_wal_range;
+
+/*
+ * All the data parsed from a backup_manifest file.
+ */
+typedef struct manifest_data
+{
+	int			version;
+	uint64		system_identifier;
+	manifest_files_hash *files;
+	manifest_wal_range *first_wal_range;
+	manifest_wal_range *last_wal_range;
+} manifest_data;
+
+/*
+ * All of the context information we need while checking a backup manifest.
+ */
+typedef struct verifier_context
+{
+	manifest_data *manifest;
+	char	   *backup_directory;
+	SimpleStringList ignore_list;
+	bool		skip_checksums;
+	bool		exit_on_error;
+	bool		saw_any_error;
+} verifier_context;
+
+extern void report_backup_error(verifier_context *context,
+								const char *pg_restrict fmt,...)
+			pg_attribute_printf(2, 3);
+extern void report_fatal_error(const char *pg_restrict fmt,...)
+			pg_attribute_printf(1, 2) pg_attribute_noreturn();
+extern bool should_ignore_relpath(verifier_context *context,
+								  const char *relpath);
+
+#endif							/* PG_VERIFYBACKUP_H */
-- 
2.18.0

From bcd2f84eb984badfdaa28ad84806c8e401259f9c Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Thu, 8 Aug 2024 15:10:43 +0530
Subject: [PATCH v10 04/12] Refactor: move skip_checksums global variable to
 verifier_context struct

To enable access to this flag in another file.
---
 src/bin/pg_verifybackup/pg_verifybackup.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index d77e70fbe38..c6d01d52335 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -113,6 +113,7 @@ typedef struct verifier_context
 	manifest_data *manifest;
 	char	   *backup_directory;
 	SimpleStringList ignore_list;
+	bool		skip_checksums;
 	bool		exit_on_error;
 	bool		saw_any_error;
 } verifier_context;
@@ -164,7 +165,6 @@ static const char *progname;
 
 /* options */
 static bool show_progress = false;
-static bool skip_checksums = false;
 
 /* Progress indicators */
 static uint64 total_size = 0;
@@ -266,7 +266,7 @@ main(int argc, char **argv)
 				quiet = true;
 				break;
 			case 's':
-				skip_checksums = true;
+				context.skip_checksums = true;
 				break;
 			case 'w':
 				wal_directory = pstrdup(optarg);
@@ -363,7 +363,7 @@ main(int argc, char **argv)
 	 * Now do the expensive work of verifying file checksums, unless we were
 	 * told to skip it.
 	 */
-	if (!skip_checksums)
+	if (!context.skip_checksums)
 		verify_backup_checksums(&context);
 
 	/*
@@ -739,7 +739,8 @@ verify_backup_file(verifier_context *context, char *relpath, char *fullpath)
 		verify_control_file(fullpath, context->manifest->system_identifier);
 
 	/* Update statistics for progress report, if necessary */
-	if (show_progress && !skip_checksums && should_verify_checksum(m))
+	if (show_progress && !context->skip_checksums &&
+		should_verify_checksum(m))
 		total_size += m->size;
 
 	/*
-- 
2.18.0

Reply via email to