On Sat, Aug 17, 2024 at 1:34 AM Robert Haas <robertmh...@gmail.com> wrote: > > On Fri, Aug 16, 2024 at 3:53 PM Robert Haas <robertmh...@gmail.com> wrote: > > + int64 num = strtoi64(relpath, &suffix, 10); > > Hit send too early. Here, seems like this should be strtoul(), not strtoi64(). >
Fixed in the attached version including others suggestions in that mail. > The documentation of --format seems to be cut-and-pasted from > pg_basebackup and the language isn't really appropriate here. e.g. > "The main data directory's contents will be written to a file > named..." but pg_verifybackup writes nothing. > I wrote that intentionally -- I didn’t mean to imply that pg_verifybackup handles this; rather, I meant that the backup tool (in this case, pg_basebackup) produces those files. I can see the confusion and have rephrased the text accordingly. > + simple_string_list_append(&context.ignore_list, "pg_wal.tar"); > + simple_string_list_append(&context.ignore_list, "pg_wal.tar.gz"); > + simple_string_list_append(&context.ignore_list, "pg_wal.tar.lz4"); > + simple_string_list_append(&context.ignore_list, "pg_wal.tar.zst"); > > Why not make the same logic that recognizes base or an OID also > recognize pg_wal as a prefix, and identify that as the WAL archive? > For now we'll have to skip it, but if you do it that way then if we > add future support for more suffixes, it'll just work, whereas this > way won't. And you'd need that code anyway if we ever can run > pg_waldump on a tarfile, because you would need to identify the > compression method. Note that the danger of the list of suffixes > getting out of sync here is not hypothetical: you added .tgz elsewhere > but not here. > Did this way. > There's probably more to look at here but I'm running out of energy for today. > Thank you for the review and committing 0004 and 0006 patches. Regards, Amul
From dfaeebdc09fd689b7e45a705e32111cb226a0657 Mon Sep 17 00:00:00 2001 From: Amul Sul <amul.sul@enterprisedb.com> Date: Tue, 20 Aug 2024 13:14:22 +0530 Subject: [PATCH v11 12/12] pg_verifybackup: Tests and document ---- NOTE: This patch is not meant to be committed separately. It should be squashed with the previous patch that implements tar format support. ---- --- doc/src/sgml/ref/pg_verifybackup.sgml | 43 ++++++++++- src/bin/pg_verifybackup/t/001_basic.pl | 6 +- src/bin/pg_verifybackup/t/004_options.pl | 2 +- src/bin/pg_verifybackup/t/005_bad_manifest.pl | 2 +- src/bin/pg_verifybackup/t/008_untar.pl | 73 ++++++------------- src/bin/pg_verifybackup/t/010_client_untar.pl | 48 +----------- 6 files changed, 73 insertions(+), 101 deletions(-) diff --git a/doc/src/sgml/ref/pg_verifybackup.sgml b/doc/src/sgml/ref/pg_verifybackup.sgml index a3f167f9f6e..ea6bc3ccb12 100644 --- a/doc/src/sgml/ref/pg_verifybackup.sgml +++ b/doc/src/sgml/ref/pg_verifybackup.sgml @@ -34,8 +34,10 @@ PostgreSQL documentation integrity of a database cluster backup taken using <command>pg_basebackup</command> against a <literal>backup_manifest</literal> generated by the server at the time - of the backup. The backup must be stored in the "plain" - format; a "tar" format backup can be checked after extracting it. + of the backup. The backup must be stored in the "plain" or "tar" + format. Verification is supported for <literal>gzip</literal>, + <literal>lz4</literal>, and <literal>zstd</literal> compressed tar backup; + any other compressed format backups can be checked after decompressing them. </para> <para> @@ -168,6 +170,43 @@ PostgreSQL documentation </listitem> </varlistentry> + <varlistentry> + <term><option>-F <replaceable class="parameter">format</replaceable></option></term> + <term><option>--format=<replaceable class="parameter">format</replaceable></option></term> + <listitem> + <para> + Specifies the format of the backup. <replaceable>format</replaceable> + can be one of the following: + + <variablelist> + <varlistentry> + <term><literal>p</literal></term> + <term><literal>plain</literal></term> + <listitem> + <para> + Backup consists of plain files with the same layout as the + source server's data directory and tablespaces. + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><literal>t</literal></term> + <term><literal>tar</literal></term> + <listitem> + <para> + Backup consists of tar files. A valid backup includes the main data + directory in a file named <filename>base.tar</filename>, the WAL + files in <filename>pg_wal.tar</filename>, and separate tar files for + each tablespace, named after the tablespace's OID, followed by the + compression extension. + </para> + </listitem> + </varlistentry> + </variablelist></para> + </listitem> + </varlistentry> + <varlistentry> <term><option>-n</option></term> <term><option>--no-parse-wal</option></term> diff --git a/src/bin/pg_verifybackup/t/001_basic.pl b/src/bin/pg_verifybackup/t/001_basic.pl index 2f3e52d296f..ca5b0402b7d 100644 --- a/src/bin/pg_verifybackup/t/001_basic.pl +++ b/src/bin/pg_verifybackup/t/001_basic.pl @@ -17,11 +17,11 @@ command_fails_like( qr/no backup directory specified/, 'target directory must be specified'); command_fails_like( - [ 'pg_verifybackup', $tempdir ], + [ 'pg_verifybackup', '-Fp', $tempdir ], qr/could not open file.*\/backup_manifest\"/, 'pg_verifybackup requires a manifest'); command_fails_like( - [ 'pg_verifybackup', $tempdir, $tempdir ], + [ 'pg_verifybackup', '-Fp', $tempdir, $tempdir ], qr/too many command-line arguments/, 'multiple target directories not allowed'); @@ -31,7 +31,7 @@ close($fh); # but then try to use an alternate, nonexisting manifest command_fails_like( - [ 'pg_verifybackup', '-m', "$tempdir/not_the_manifest", $tempdir ], + [ 'pg_verifybackup', '-Fp', '-m', "$tempdir/not_the_manifest", $tempdir ], qr/could not open file.*\/not_the_manifest\"/, 'pg_verifybackup respects -m flag'); diff --git a/src/bin/pg_verifybackup/t/004_options.pl b/src/bin/pg_verifybackup/t/004_options.pl index 8ed2214408e..2f197648740 100644 --- a/src/bin/pg_verifybackup/t/004_options.pl +++ b/src/bin/pg_verifybackup/t/004_options.pl @@ -108,7 +108,7 @@ unlike( # Test valid manifest with nonexistent backup directory. command_fails_like( [ - 'pg_verifybackup', '-m', + 'pg_verifybackup', '-Fp', '-m', "$backup_path/backup_manifest", "$backup_path/fake" ], qr/could not open directory/, diff --git a/src/bin/pg_verifybackup/t/005_bad_manifest.pl b/src/bin/pg_verifybackup/t/005_bad_manifest.pl index c4ed64b62d5..28c51b6feb0 100644 --- a/src/bin/pg_verifybackup/t/005_bad_manifest.pl +++ b/src/bin/pg_verifybackup/t/005_bad_manifest.pl @@ -208,7 +208,7 @@ sub test_bad_manifest print $fh $manifest_contents; close($fh); - command_fails_like([ 'pg_verifybackup', $tempdir ], $regexp, $test_name); + command_fails_like([ 'pg_verifybackup', '-Fp', $tempdir ], $regexp, $test_name); return; } diff --git a/src/bin/pg_verifybackup/t/008_untar.pl b/src/bin/pg_verifybackup/t/008_untar.pl index 7a09f3b75b2..9896560adc3 100644 --- a/src/bin/pg_verifybackup/t/008_untar.pl +++ b/src/bin/pg_verifybackup/t/008_untar.pl @@ -16,6 +16,20 @@ my $primary = PostgreSQL::Test::Cluster->new('primary'); $primary->init(allows_streaming => 1); $primary->start; +# Create a couple of directories to use as tablespaces. +my $TS1_LOCATION = $primary->backup_dir .'/ts1'; +$TS1_LOCATION =~ s/\/\.\//\//g; # collapse foo/./bar to foo/bar +mkdir($TS1_LOCATION); + +# Create a tablespace with table in it. +$primary->safe_psql('postgres', qq( + CREATE TABLESPACE regress_ts1 LOCATION '$TS1_LOCATION'; + SELECT oid FROM pg_tablespace WHERE spcname = 'regress_ts1'; + CREATE TABLE regress_tbl1(i int) TABLESPACE regress_ts1; + INSERT INTO regress_tbl1 VALUES(generate_series(1,5));)); +my $tsoid = $primary->safe_psql('postgres', qq( + SELECT oid FROM pg_tablespace WHERE spcname = 'regress_ts1')); + my $backup_path = $primary->backup_dir . '/server-backup'; my $extract_path = $primary->backup_dir . '/extracted-backup'; @@ -23,39 +37,31 @@ my @test_configuration = ( { 'compression_method' => 'none', 'backup_flags' => [], - 'backup_archive' => 'base.tar', + 'backup_archive' => ['base.tar', "$tsoid.tar"], 'enabled' => 1 }, { 'compression_method' => 'gzip', 'backup_flags' => [ '--compress', 'server-gzip' ], - 'backup_archive' => 'base.tar.gz', - 'decompress_program' => $ENV{'GZIP_PROGRAM'}, - 'decompress_flags' => ['-d'], + 'backup_archive' => [ 'base.tar.gz', "$tsoid.tar.gz" ], 'enabled' => check_pg_config("#define HAVE_LIBZ 1") }, { 'compression_method' => 'lz4', 'backup_flags' => [ '--compress', 'server-lz4' ], - 'backup_archive' => 'base.tar.lz4', - 'decompress_program' => $ENV{'LZ4'}, - 'decompress_flags' => [ '-d', '-m' ], + 'backup_archive' => ['base.tar.lz4', "$tsoid.tar.lz4" ], 'enabled' => check_pg_config("#define USE_LZ4 1") }, { 'compression_method' => 'zstd', 'backup_flags' => [ '--compress', 'server-zstd' ], - 'backup_archive' => 'base.tar.zst', - 'decompress_program' => $ENV{'ZSTD'}, - 'decompress_flags' => ['-d'], + 'backup_archive' => [ 'base.tar.zst', "$tsoid.tar.zst" ], 'enabled' => check_pg_config("#define USE_ZSTD 1") }, { 'compression_method' => 'zstd', 'backup_flags' => [ '--compress', 'server-zstd:level=1,long' ], - 'backup_archive' => 'base.tar.zst', - 'decompress_program' => $ENV{'ZSTD'}, - 'decompress_flags' => ['-d'], + 'backup_archive' => [ 'base.tar.zst', "$tsoid.tar.zst" ], 'enabled' => check_pg_config("#define USE_ZSTD 1") }); @@ -86,47 +92,16 @@ for my $tc (@test_configuration) my $backup_files = join(',', sort grep { $_ ne '.' && $_ ne '..' } slurp_dir($backup_path)); my $expected_backup_files = - join(',', sort ('backup_manifest', $tc->{'backup_archive'})); + join(',', sort ('backup_manifest', @{ $tc->{'backup_archive'} })); is($backup_files, $expected_backup_files, "found expected backup files, compression $method"); - # Decompress. - if (exists $tc->{'decompress_program'}) - { - my @decompress = ($tc->{'decompress_program'}); - push @decompress, @{ $tc->{'decompress_flags'} } - if $tc->{'decompress_flags'}; - push @decompress, $backup_path . '/' . $tc->{'backup_archive'}; - system_or_bail(@decompress); - } - - SKIP: - { - my $tar = $ENV{TAR}; - # don't check for a working tar here, to accommodate various odd - # cases. If tar doesn't work the init_from_backup below will fail. - skip "no tar program available", 1 - if (!defined $tar || $tar eq ''); - - # Untar. - mkdir($extract_path); - system_or_bail($tar, 'xf', $backup_path . '/base.tar', - '-C', $extract_path); - - # Verify. - $primary->command_ok( - [ - 'pg_verifybackup', '-n', - '-m', "$backup_path/backup_manifest", - '-e', $extract_path - ], - "verify backup, compression $method"); - } + # Verify tar backup. + $primary->command_ok(['pg_verifybackup', '-n', '-e', $backup_path], + "verify backup, compression $method"); # Cleanup. - unlink($backup_path . '/backup_manifest'); - unlink($backup_path . '/base.tar'); - unlink($backup_path . '/' . $tc->{'backup_archive'}); + rmtree($backup_path); rmtree($extract_path); } } diff --git a/src/bin/pg_verifybackup/t/010_client_untar.pl b/src/bin/pg_verifybackup/t/010_client_untar.pl index 8c076d46dee..6b7d7483f6e 100644 --- a/src/bin/pg_verifybackup/t/010_client_untar.pl +++ b/src/bin/pg_verifybackup/t/010_client_untar.pl @@ -29,41 +29,30 @@ my @test_configuration = ( 'compression_method' => 'gzip', 'backup_flags' => [ '--compress', 'client-gzip:5' ], 'backup_archive' => 'base.tar.gz', - 'decompress_program' => $ENV{'GZIP_PROGRAM'}, - 'decompress_flags' => ['-d'], 'enabled' => check_pg_config("#define HAVE_LIBZ 1") }, { 'compression_method' => 'lz4', 'backup_flags' => [ '--compress', 'client-lz4:5' ], 'backup_archive' => 'base.tar.lz4', - 'decompress_program' => $ENV{'LZ4'}, - 'decompress_flags' => ['-d'], - 'output_file' => 'base.tar', 'enabled' => check_pg_config("#define USE_LZ4 1") }, { 'compression_method' => 'zstd', 'backup_flags' => [ '--compress', 'client-zstd:5' ], 'backup_archive' => 'base.tar.zst', - 'decompress_program' => $ENV{'ZSTD'}, - 'decompress_flags' => ['-d'], 'enabled' => check_pg_config("#define USE_ZSTD 1") }, { 'compression_method' => 'zstd', 'backup_flags' => [ '--compress', 'client-zstd:level=1,long' ], 'backup_archive' => 'base.tar.zst', - 'decompress_program' => $ENV{'ZSTD'}, - 'decompress_flags' => ['-d'], 'enabled' => check_pg_config("#define USE_ZSTD 1") }, { 'compression_method' => 'parallel zstd', 'backup_flags' => [ '--compress', 'client-zstd:workers=3' ], 'backup_archive' => 'base.tar.zst', - 'decompress_program' => $ENV{'ZSTD'}, - 'decompress_flags' => ['-d'], 'enabled' => check_pg_config("#define USE_ZSTD 1"), 'possibly_unsupported' => qr/could not set compression worker count to 3: Unsupported parameter/ @@ -118,40 +107,9 @@ for my $tc (@test_configuration) is($backup_files, $expected_backup_files, "found expected backup files, compression $method"); - # Decompress. - if (exists $tc->{'decompress_program'}) - { - my @decompress = ($tc->{'decompress_program'}); - push @decompress, @{ $tc->{'decompress_flags'} } - if $tc->{'decompress_flags'}; - push @decompress, $backup_path . '/' . $tc->{'backup_archive'}; - push @decompress, $backup_path . '/' . $tc->{'output_file'} - if $tc->{'output_file'}; - system_or_bail(@decompress); - } - - SKIP: - { - my $tar = $ENV{TAR}; - # don't check for a working tar here, to accommodate various odd - # cases. If tar doesn't work the init_from_backup below will fail. - skip "no tar program available", 1 - if (!defined $tar || $tar eq ''); - - # Untar. - mkdir($extract_path); - system_or_bail($tar, 'xf', $backup_path . '/base.tar', - '-C', $extract_path); - - # Verify. - $primary->command_ok( - [ - 'pg_verifybackup', '-n', - '-m', "$backup_path/backup_manifest", - '-e', $extract_path - ], - "verify backup, compression $method"); - } + # Verify tar backup. + $primary->command_ok( [ 'pg_verifybackup', '-n', '-e', $backup_path ], + "verify backup, compression $method"); # Cleanup. rmtree($extract_path); -- 2.18.0
From afc610cc11c3ec4afdbc354a0ea08340b473b254 Mon Sep 17 00:00:00 2001 From: Amul Sul <amul.sul@enterprisedb.com> Date: Wed, 31 Jul 2024 16:22:07 +0530 Subject: [PATCH v11 11/12] pg_verifybackup: Read tar files and verify its contents This patch implements TAR format backup verification. For progress reporting support, we perform this verification in two passes: the first pass calculates total_size, and the second pass updates done_size as verification progresses. For the verification, in the first pass, we call verify_tar_backup_file(), which performs basic verification by expecting only base.tar, pg_wal.tar, or <tablespaceoid>.tar files and raises an error for any other files. It also determines the compression type of the archive file. All this information is stored in a newly added tarFile struct, which is appended to a list that will be used in the second pass (by verify_tar_content()) for the final verification. In the second pass, the tar archives are read, decompressed, and the required verification is carried out. For reading and decompression, fe_utils/astreamer.h is used. For verification, a new archive streamer has been added in astreamer_verify.c to handle TAR member files and their contents; see astreamer_verify_content() for details. The stack of astreamers will be set up for each TAR file in verify_tar_content(), depending on its compression type which is detected in the first pass. When information about a TAR member file (i.e., ASTREAMER_MEMBER_HEADER) is received, we first verify its entry against the backup manifest. We then decide if further checks are needed, such as checksum verification and control data verification (if it is a pg_control file), once the member file contents are received. Although this decision could be made when the contents are received, it is more efficient to make it earlier since the member file contents are received in multiple iterations. In short, we process ASTREAMER_MEMBER_CONTENTS multiple times but only once for other ASTREAMER_MEMBER_* cases. We maintain this information in the astreamer_verify structure for each member file, which is reset when the file ends. Unlike in a plain backup, checksum verification here occurs in two steps. First, as the contents are received, the checksum is computed incrementally (see member_compute_checksum). Then, at the end of processing the member file, the final verification is performed (see member_verify_checksum). Similarly, during the content receiving stage, if the file is pg_control, the data will be copied into a local buffer (see member_copy_control_data). The verification will then be carried out at the end of the member file processing (see member_verify_control_data) --- src/bin/pg_verifybackup/Makefile | 4 +- src/bin/pg_verifybackup/astreamer_verify.c | 363 +++++++++++++++++++++ src/bin/pg_verifybackup/meson.build | 6 +- src/bin/pg_verifybackup/pg_verifybackup.c | 284 +++++++++++++++- src/bin/pg_verifybackup/pg_verifybackup.h | 6 + src/tools/pgindent/typedefs.list | 2 + 6 files changed, 659 insertions(+), 6 deletions(-) create mode 100644 src/bin/pg_verifybackup/astreamer_verify.c diff --git a/src/bin/pg_verifybackup/Makefile b/src/bin/pg_verifybackup/Makefile index 7c045f142e8..df7aaabd530 100644 --- a/src/bin/pg_verifybackup/Makefile +++ b/src/bin/pg_verifybackup/Makefile @@ -17,11 +17,13 @@ top_builddir = ../../.. include $(top_builddir)/src/Makefile.global # We need libpq only because fe_utils does. +override CPPFLAGS := -I$(libpq_srcdir) $(CPPFLAGS) LDFLAGS_INTERNAL += -L$(top_builddir)/src/fe_utils -lpgfeutils $(libpq_pgport) OBJS = \ $(WIN32RES) \ - pg_verifybackup.o + pg_verifybackup.o \ + astreamer_verify.o all: pg_verifybackup diff --git a/src/bin/pg_verifybackup/astreamer_verify.c b/src/bin/pg_verifybackup/astreamer_verify.c new file mode 100644 index 00000000000..3d3659195a9 --- /dev/null +++ b/src/bin/pg_verifybackup/astreamer_verify.c @@ -0,0 +1,363 @@ +/*------------------------------------------------------------------------- + * + * astreamer_verify.c + * + * Extend fe_utils/astreamer.h archive streaming facility to verify TAR + * backup. + * + * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group + * + * src/bin/pg_verifybackup/astreamer_verify.c + * + *------------------------------------------------------------------------- + */ + +#include "postgres_fe.h" + +#include "common/logging.h" +#include "pg_verifybackup.h" + +typedef struct astreamer_verify +{ + astreamer base; + verifier_context *context; + char *archive_name; + Oid tblspc_oid; + pg_checksum_context *checksum_ctx; + + /* Hold information for a member file verification */ + manifest_file *mfile; + int64 received_bytes; + bool verify_checksum; + bool verify_control_data; +} astreamer_verify; + +static void astreamer_verify_content(astreamer *streamer, + astreamer_member *member, + const char *data, int len, + astreamer_archive_context context); +static void astreamer_verify_finalize(astreamer *streamer); +static void astreamer_verify_free(astreamer *streamer); + +static void member_verify_header(astreamer *streamer, astreamer_member *member); +static void member_compute_checksum(astreamer *streamer, + astreamer_member *member, + const char *data, int len); +static void member_verify_checksum(astreamer *streamer); +static void member_copy_control_data(astreamer *streamer, + astreamer_member *member, + const char *data, int len); +static void member_verify_control_data(astreamer *streamer); +static void member_reset_info(astreamer *streamer); + +static const astreamer_ops astreamer_verify_ops = { + .content = astreamer_verify_content, + .finalize = astreamer_verify_finalize, + .free = astreamer_verify_free +}; + +/* + * Create a astreamer that can verifies content of a TAR file. + */ +astreamer * +astreamer_verify_content_new(astreamer *next, verifier_context *context, + char *archive_name, Oid tblspc_oid) +{ + astreamer_verify *streamer; + + streamer = palloc0(sizeof(astreamer_verify)); + *((const astreamer_ops **) &streamer->base.bbs_ops) = + &astreamer_verify_ops; + + streamer->base.bbs_next = next; + streamer->context = context; + streamer->archive_name = archive_name; + streamer->tblspc_oid = tblspc_oid; + initStringInfo(&streamer->base.bbs_buffer); + + if (!context->skip_checksums) + streamer->checksum_ctx = pg_malloc(sizeof(pg_checksum_context)); + + return &streamer->base; +} + +/* + * The main entry point of the archive streamer for verifying tar members. + */ +static void +astreamer_verify_content(astreamer *streamer, astreamer_member *member, + const char *data, int len, + astreamer_archive_context context) +{ + astreamer_verify *mystreamer = (astreamer_verify *) streamer; + + Assert(context != ASTREAMER_UNKNOWN); + + switch (context) + { + case ASTREAMER_MEMBER_HEADER: + + /* + * Perform the initial check and setup verification steps. + */ + member_verify_header(streamer, member); + break; + + case ASTREAMER_MEMBER_CONTENTS: + + /* + * Since we are receiving the file in chunks, we need to process + * the member content according to the flags set by the member + * header processing routine, which includes checksum computation + * and copying control data to the local buffer. + */ + if (mystreamer->verify_checksum) + member_compute_checksum(streamer, member, data, len); + + if (mystreamer->verify_control_data) + member_copy_control_data(streamer, member, data, len); + break; + + case ASTREAMER_MEMBER_TRAILER: + + /* + * We have reached the end of the member file. By this point, we + * should have successfully computed the checksum of the received + * content and copied the entire pg_control file data into our + * local buffer. We can now proceed with the final verification. + */ + if (mystreamer->verify_checksum) + member_verify_checksum(streamer); + + if (mystreamer->verify_control_data) + member_verify_control_data(streamer); + + /* + * Reset the temporary information stored for the verification. + */ + member_reset_info(streamer); + break; + + case ASTREAMER_ARCHIVE_TRAILER: + break; + + default: + /* Shouldn't happen. */ + pg_fatal("unexpected state while parsing tar archive"); + } +} + +/* + * End-of-stream processing for a astreamer_verify stream. + */ +static void +astreamer_verify_finalize(astreamer *streamer) +{ + Assert(streamer->bbs_next == NULL); +} + +/* + * Free memory associated with a astreamer_verify stream. + */ +static void +astreamer_verify_free(astreamer *streamer) +{ + astreamer_verify *mystreamer = (astreamer_verify *) streamer; + + if (mystreamer->checksum_ctx) + pfree(mystreamer->checksum_ctx); + + pfree(streamer->bbs_buffer.data); + pfree(streamer); +} + +/* + * Verifies the tar member entry if it corresponds to a file in the backup + * manifest. If the archive being processed is a tablespace, prepares the + * required file path for subsequent operations. Finally, determines if + * checksum verification and control data verification need to be performed + * during file content processing + */ +static void +member_verify_header(astreamer *streamer, astreamer_member *member) +{ + astreamer_verify *mystreamer = (astreamer_verify *) streamer; + manifest_file *m; + + /* We are only interested in files that are not in the ignore list. */ + if (member->is_directory || member->is_link || + should_ignore_relpath(mystreamer->context, member->pathname)) + return; + + /* + * The backup_manifest stores a relative path to the base directory for + * files belong tablespace, whereas <tablespaceoid>.tar doesn't. Prepare + * the required path, otherwise, the manfiest entry verification will + * fail. + */ + if (OidIsValid(mystreamer->tblspc_oid)) + { + char temp[MAXPGPATH]; + + /* Copy original name at temporary space */ + memcpy(temp, member->pathname, MAXPGPATH); + + snprintf(member->pathname, MAXPGPATH, "%s/%d/%s", + "pg_tblspc", mystreamer->tblspc_oid, temp); + } + + /* Check the manifest entry */ + m = verify_manifest_entry(mystreamer->context, member->pathname, + member->size); + mystreamer->mfile = (void *) m; + + /* + * Prepare for checksum and control data verification. + * + * We could have these checks while receiving contents. However, since + * contents are received in multiple iterations, this would result in + * these lengthy checks being performed multiple times. Instead, having a + * single flag would be more efficient. + */ + mystreamer->verify_checksum = + (!mystreamer->context->skip_checksums && should_verify_checksum(m)); + mystreamer->verify_control_data = + should_verify_control_data(mystreamer->context->manifest, m); + + /* Initialize the context required for checksum verification. */ + if (mystreamer->verify_checksum && + pg_checksum_init(mystreamer->checksum_ctx, m->checksum_type) < 0) + { + report_backup_error(mystreamer->context, + "%s: could not initialize checksum of file \"%s\"", + mystreamer->archive_name, m->pathname); + + /* + * Checksum verification cannot be performed without proper context + * initialization. + */ + mystreamer->verify_checksum = false; + } +} + +/* + * Computes the checksum incrementally for the received file content. + * + * The caller should pass a correctly initialized checksum_ctx, which will be + * used for incremental checksum computation. + */ +static void +member_compute_checksum(astreamer *streamer, astreamer_member *member, + const char *data, int len) +{ + astreamer_verify *mystreamer = (astreamer_verify *) streamer; + pg_checksum_context *checksum_ctx = mystreamer->checksum_ctx; + manifest_file *m = mystreamer->mfile; + + Assert(mystreamer->verify_checksum); + + /* Should have came for the right file */ + Assert(strcmp(member->pathname, m->pathname) == 0); + + /* + * The checksum context should match the type noted in the backup + * manifest. + */ + Assert(checksum_ctx->type == m->checksum_type); + + /* + * Update the total count of computed checksum bytes for cross-checking + * with the file size in the final verification stage. + */ + mystreamer->received_bytes += len; + + if (pg_checksum_update(checksum_ctx, (uint8 *) data, len) < 0) + { + report_backup_error(mystreamer->context, + "could not update checksum of file \"%s\"", + m->pathname); + mystreamer->verify_checksum = false; + } +} + +/* + * Perform the final computation and checksum verification after the entire + * file content has been processed. + */ +static void +member_verify_checksum(astreamer *streamer) +{ + astreamer_verify *mystreamer = (astreamer_verify *) streamer; + + Assert(mystreamer->verify_checksum); + + verify_checksum(mystreamer->context, mystreamer->mfile, + mystreamer->checksum_ctx, mystreamer->received_bytes); +} + +/* + * Stores the pg_control file contents into a local buffer; we need the entire + * control file data for verification. + */ +static void +member_copy_control_data(astreamer *streamer, astreamer_member *member, + const char *data, int len) +{ + /* Should be here only for control file */ + Assert(strcmp(member->pathname, "global/pg_control") == 0); + Assert(((astreamer_verify *) streamer)->verify_control_data); + + /* Copy enough control file data needed for verification. */ + astreamer_buffer_until(streamer, &data, &len, sizeof(ControlFileData)); +} + +/* + * Performs the CRC calculation of pg_control data and then calls the routines + * that execute the final verification of the control file information. + */ +static void +member_verify_control_data(astreamer *streamer) +{ + astreamer_verify *mystreamer = (astreamer_verify *) streamer; + manifest_data *manifest = mystreamer->context->manifest; + ControlFileData control_file; + pg_crc32c crc; + bool crc_ok; + + /* Should be here only for control file */ + Assert(strcmp(mystreamer->mfile->pathname, "global/pg_control") == 0); + Assert(mystreamer->verify_control_data); + + /* Should have enough control file data needed for verification. */ + if (streamer->bbs_buffer.len != sizeof(ControlFileData)) + report_fatal_error("%s: unexpected control file size: %d, should be %zu", + mystreamer->archive_name, streamer->bbs_buffer.len, + sizeof(ControlFileData)); + + memcpy(&control_file, streamer->bbs_buffer.data, sizeof(ControlFileData)); + + /* Check the CRC. */ + INIT_CRC32C(crc); + COMP_CRC32C(crc, (char *) (&control_file), offsetof(ControlFileData, crc)); + FIN_CRC32C(crc); + + crc_ok = EQ_CRC32C(crc, control_file.crc); + + /* Do the final control data verification. */ + verify_control_data(&control_file, mystreamer->mfile->pathname, crc_ok, + manifest->system_identifier); +} + +/* + * Reset flags and free memory allocations for member file verification. + */ +static void +member_reset_info(astreamer *streamer) +{ + astreamer_verify *mystreamer = (astreamer_verify *) streamer; + + mystreamer->mfile = NULL; + mystreamer->received_bytes = 0; + mystreamer->verify_checksum = false; + mystreamer->verify_control_data = false; +} diff --git a/src/bin/pg_verifybackup/meson.build b/src/bin/pg_verifybackup/meson.build index 7c7d31a0350..1e3fcf7ee5a 100644 --- a/src/bin/pg_verifybackup/meson.build +++ b/src/bin/pg_verifybackup/meson.build @@ -1,7 +1,8 @@ # Copyright (c) 2022-2024, PostgreSQL Global Development Group pg_verifybackup_sources = files( - 'pg_verifybackup.c' + 'pg_verifybackup.c', + 'astreamer_verify.c' ) if host_system == 'windows' @@ -10,9 +11,10 @@ if host_system == 'windows' '--FILEDESC', 'pg_verifybackup - verify a backup against using a backup manifest']) endif +pg_verifybackup_deps = [frontend_code, libpq, lz4, zlib, zstd] pg_verifybackup = executable('pg_verifybackup', pg_verifybackup_sources, - dependencies: [frontend_code, libpq], + dependencies: pg_verifybackup_deps, kwargs: default_bin_args, ) bin_targets += pg_verifybackup diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c index 4c0367038b4..09840a2cef4 100644 --- a/src/bin/pg_verifybackup/pg_verifybackup.c +++ b/src/bin/pg_verifybackup/pg_verifybackup.c @@ -22,6 +22,7 @@ #include "common/parse_manifest.h" #include "fe_utils/simple_list.h" #include "getopt_long.h" +#include "limits.h" #include "pg_verifybackup.h" #include "pgtime.h" @@ -44,6 +45,16 @@ */ #define READ_CHUNK_SIZE (128 * 1024) +/* + * Tar archive information needed for content verification. + */ +typedef struct tar_archive +{ + char *relpath; + Oid tblspc_oid; + pg_compress_algorithm compress_algorithm; +} tar_archive; + static manifest_data *parse_manifest_file(char *manifest_path); static void verifybackup_version_cb(JsonManifestParseContext *context, int manifest_version); @@ -65,8 +76,14 @@ static void report_manifest_error(JsonManifestParseContext *context, static char find_backup_format(verifier_context *context); static void verify_backup_directory(verifier_context *context, char *relpath, char *fullpath); -static void verify_plain_backup_file(verifier_context *context, - char *relpath, char *fullpath); +static void verify_plain_backup_file(verifier_context *context, char *relpath, + char *fullpath); +static void verify_tar_backup_file(verifier_context *context, char *relpath, + char *fullpath, SimplePtrList *tarfiles); +static void verify_tar_file_contents(verifier_context *context, + SimplePtrList *tarfiles); +static void verify_tar_contents(verifier_context *context, char *relpath, + char *fullpath, astreamer *streamer); static void report_extra_backup_files(verifier_context *context); static void verify_backup_checksums(verifier_context *context); static void verify_file_checksum(verifier_context *context, @@ -75,6 +92,10 @@ static void verify_file_checksum(verifier_context *context, static void parse_required_wal(verifier_context *context, char *pg_waldump_path, char *wal_directory); +static astreamer *create_archive_verifier(verifier_context *context, + char *archive_name, + Oid tblspc_oid, + pg_compress_algorithm compress_algo); static void progress_report(bool finished); static void usage(void); @@ -561,6 +582,7 @@ verify_backup_directory(verifier_context *context, char *relpath, { DIR *dir; struct dirent *dirent; + SimplePtrList tarfiles = {NULL, NULL}; dir = opendir(fullpath); if (dir == NULL) @@ -600,12 +622,23 @@ verify_backup_directory(verifier_context *context, char *relpath, newrelpath = psprintf("%s/%s", relpath, filename); if (!should_ignore_relpath(context, newrelpath)) - verify_plain_backup_file(context, newrelpath, newfullpath); + { + if (context->format == 'p') + verify_plain_backup_file(context, newrelpath, newfullpath); + else + verify_tar_backup_file(context, newrelpath, newfullpath, + &tarfiles); + } pfree(newfullpath); pfree(newrelpath); } + /* Perform the final verification of the tar contents, if any. */ + Assert(tarfiles.head == NULL || context->format == 't'); + if (tarfiles.head != NULL) + verify_tar_file_contents(context, &tarfiles); + if (closedir(dir)) { report_backup_error(context, @@ -682,6 +715,215 @@ verify_plain_backup_file(verifier_context *context, char *relpath, total_size += m->size; } +/* + * Verify one tar archive file. + * + * This function does not perform a complete verification; it only carries out + * basic validation of the tar format backup file, detects the compression + * type, and appends that information to the tarfiles list. An error will be + * reported if the tar archive is inaccessible, or if the file type, name, or + * compression type is not as expected. + * + * The arguments to this function are mostly the same as the + * verify_plain_backup_file. The additional argument is the file size for + * progress report. + */ +static void +verify_tar_backup_file(verifier_context *context, char *relpath, + char *fullpath, SimplePtrList *tarfiles) +{ + struct stat sb; + Oid tblspc_oid = InvalidOid; + pg_compress_algorithm compress_algorithm; + tar_archive *tar_file; + char *suffix = NULL; + + /* Should be tar format backup */ + Assert(context->format == 't'); + + /* Get file information */ + if (stat(fullpath, &sb) != 0) + { + report_backup_error(context, + "could not stat file or directory \"%s\": %m", + relpath); + return; + } + + /* In a tar format backup, we expect only plain files. */ + if (!S_ISREG(sb.st_mode)) + { + report_backup_error(context, + "\"%s\" is not a plain file", + relpath); + return; + } + + /* + * We expect tar archive files for backing up the main directory, + * tablespace, and pg_wal directory. + * + * pg_basebackup writes the main data directory to an archive file named + * base.tar, the pg_wal directory to pg_wal.tar, and the tablespace + * directory to <tablespaceoid>.tar, each followed by a compression type + * extension such as .gz, .lz4, or .zst. + */ + if (strncmp("base", relpath, 4) == 0) + suffix = relpath + 4; + else if (strncmp("pg_wal", relpath, 6) == 0) + suffix = relpath + 6; + else + { + /* Expected a <tablespaceoid>.tar file here. */ + uint64 num = strtoul(relpath, &suffix, 10); + + /* + * Report an error if we didn't consume at least one character, if the + * result is 0, or if the value is too large to be a valid OID. + */ + if (suffix == NULL || num <= 0 || num > OID_MAX) + report_backup_error(context, + "\"%s\" unexpected file in the tar format backup", + relpath); + tblspc_oid = (Oid) num; + } + + /* Now, check the compression type of the tar file */ + if (strcmp(suffix, ".tar") == 0) + compress_algorithm = PG_COMPRESSION_NONE; + else if (strcmp(suffix, ".tgz") == 0) + compress_algorithm = PG_COMPRESSION_GZIP; + else if (strcmp(suffix, ".tar.gz") == 0) + compress_algorithm = PG_COMPRESSION_GZIP; + else if (strcmp(suffix, ".tar.lz4") == 0) + compress_algorithm = PG_COMPRESSION_LZ4; + else if (strcmp(suffix, ".tar.zst") == 0) + compress_algorithm = PG_COMPRESSION_ZSTD; + else + { + report_backup_error(context, + "\"%s\" unexpected file in the tar format backup", + relpath); + return; + } + + /* + * Ignore WALs, as reading and verification will be handled through + * pg_waldump. + */ + if (strncmp("pg_wal", relpath, 6) == 0) + return; + + /* + * Append the information to the list for complete verification at a later + * stage. + */ + tar_file = pg_malloc(sizeof(tar_archive)); + tar_file->relpath = pstrdup(relpath); + tar_file->tblspc_oid = tblspc_oid; + tar_file->compress_algorithm = compress_algorithm; + + simple_ptr_list_append(tarfiles, tar_file); + + /* Update statistics for progress report, if necessary */ + if (show_progress) + total_size += sb.st_size; +} + +/* + * This is the final part of tar file verification, which prepares the archive + * streamer stack according to the tar file compression format for each tar + * archive and invokes them for reading, decompressing, and ultimately + * verifying the contents. + * + * The arguments to this function should be a list of valid tar archives to + * verify, and the allocation will be freed once the verification is complete. + */ +static void +verify_tar_file_contents(verifier_context *context, SimplePtrList *tarfiles) +{ + SimplePtrListCell *cell; + + progress_report(false); + + for (cell = tarfiles->head; cell != NULL; cell = cell->next) + { + tar_archive *tar_file = (tar_archive *) cell->ptr; + astreamer *streamer; + char *fullpath; + + /* Prepare archive streamer stack */ + streamer = create_archive_verifier(context, + tar_file->relpath, + tar_file->tblspc_oid, + tar_file->compress_algorithm); + + /* Compute the full pathname to the target file. */ + fullpath = psprintf("%s/%s", context->backup_directory, + tar_file->relpath); + + /* Invoke the streamer for reading, decompressing, and verifying. */ + verify_tar_contents(context, tar_file->relpath, fullpath, streamer); + + /* Cleanup. */ + pfree(tar_file->relpath); + pfree(tar_file); + pfree(fullpath); + + astreamer_finalize(streamer); + astreamer_free(streamer); + } + simple_ptr_list_destroy(tarfiles); + + progress_report(true); +} + +/* + * Performs the actual work for tar content verification. It reads a given tar + * file in predefined chunks and passes it to the streamer, which initiates + * routines for decompression (if necessary) and then verifies each member + * within the tar archive. + */ +static void +verify_tar_contents(verifier_context *context, char *relpath, char *fullpath, + astreamer *streamer) +{ + int fd; + int rc; + char *buffer; + + pg_log_debug("reading \"%s\"", fullpath); + + /* Open the target file. */ + if ((fd = open(fullpath, O_RDONLY | PG_BINARY, 0)) < 0) + { + report_backup_error(context, "could not open file \"%s\": %m", + relpath); + return; + } + + buffer = pg_malloc(READ_CHUNK_SIZE * sizeof(uint8)); + + /* Perform the reads */ + while ((rc = read(fd, buffer, READ_CHUNK_SIZE)) > 0) + { + astreamer_content(streamer, NULL, buffer, rc, ASTREAMER_UNKNOWN); + + /* Report progress */ + done_size += rc; + progress_report(false); + } + + if (rc < 0) + report_backup_error(context, "could not read file \"%s\": %m", + relpath); + + /* Close the file. */ + if (close(fd) != 0) + report_backup_error(context, "could not close file \"%s\": %m", + relpath); +} + /* * Verify file and its size entry in the manifest. */ @@ -1050,6 +1292,42 @@ find_backup_format(verifier_context *context) return result; } +/* + * Identifies the necessary steps for verifying the contents of the + * provided tar file. + */ +static astreamer * +create_archive_verifier(verifier_context *context, char *archive_name, + Oid tblspc_oid, pg_compress_algorithm compress_algo) +{ + astreamer *streamer = NULL; + + /* Should be here only for tar backup */ + Assert(context->format == 't'); + + /* + * To verify the contents of the tar file, the initial step is to parse + * its content. + */ + streamer = astreamer_verify_content_new(streamer, context, archive_name, + tblspc_oid); + streamer = astreamer_tar_parser_new(streamer); + + /* + * If the tar file is compressed, we must perform the appropriate + * decompression operation before proceeding with the verification of its + * contents. + */ + if (compress_algo == PG_COMPRESSION_GZIP) + streamer = astreamer_gzip_decompressor_new(streamer); + else if (compress_algo == PG_COMPRESSION_LZ4) + streamer = astreamer_lz4_decompressor_new(streamer); + else if (compress_algo == PG_COMPRESSION_ZSTD) + streamer = astreamer_zstd_decompressor_new(streamer); + + return streamer; +} + /* * Print a progress report based on the global variables. * diff --git a/src/bin/pg_verifybackup/pg_verifybackup.h b/src/bin/pg_verifybackup/pg_verifybackup.h index d3b9f733087..006d457511d 100644 --- a/src/bin/pg_verifybackup/pg_verifybackup.h +++ b/src/bin/pg_verifybackup/pg_verifybackup.h @@ -18,6 +18,7 @@ #include "common/hashfn_unstable.h" #include "common/logging.h" #include "common/parse_manifest.h" +#include "fe_utils/astreamer.h" #include "fe_utils/simple_list.h" /* @@ -123,4 +124,9 @@ extern void report_fatal_error(const char *pg_restrict fmt,...) extern bool should_ignore_relpath(verifier_context *context, const char *relpath); +extern astreamer *astreamer_verify_content_new(astreamer *next, + verifier_context *context, + char *archive_name, + Oid tblspc_oid); + #endif /* PG_VERIFYBACKUP_H */ diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list index 6d424c89186..143dea2feaf 100644 --- a/src/tools/pgindent/typedefs.list +++ b/src/tools/pgindent/typedefs.list @@ -3330,6 +3330,7 @@ astreamer_plain_writer astreamer_recovery_injector astreamer_tar_archiver astreamer_tar_parser +astreamer_verify astreamer_zstd_frame bgworker_main_type bh_node_type @@ -3951,6 +3952,7 @@ substitute_phv_relids_context subxids_array_status symbol tablespaceinfo +tar_archive td_entry teSection temp_tablespaces_extra -- 2.18.0
From 34d3478011bb94672378b3860724a9a89e2023c0 Mon Sep 17 00:00:00 2001 From: Amul Sul <amul.sul@enterprisedb.com> Date: Tue, 2 Jul 2024 10:26:35 +0530 Subject: [PATCH v11 10/12] pg_verifybackup: Add backup format and compression option ---- NOTE: This patch is not meant to be committed separately. It should be squashed with the next patch that implements tar format support. ---- --- src/bin/pg_verifybackup/pg_verifybackup.c | 73 ++++++++++++++++++++++- src/bin/pg_verifybackup/pg_verifybackup.h | 1 + 2 files changed, 72 insertions(+), 2 deletions(-) diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c index a19f344ea67..4c0367038b4 100644 --- a/src/bin/pg_verifybackup/pg_verifybackup.c +++ b/src/bin/pg_verifybackup/pg_verifybackup.c @@ -62,6 +62,7 @@ static void report_manifest_error(JsonManifestParseContext *context, const char *fmt,...) pg_attribute_printf(2, 3) pg_attribute_noreturn(); +static char find_backup_format(verifier_context *context); static void verify_backup_directory(verifier_context *context, char *relpath, char *fullpath); static void verify_plain_backup_file(verifier_context *context, @@ -97,6 +98,7 @@ main(int argc, char **argv) {"exit-on-error", no_argument, NULL, 'e'}, {"ignore", required_argument, NULL, 'i'}, {"manifest-path", required_argument, NULL, 'm'}, + {"format", required_argument, NULL, 'F'}, {"no-parse-wal", no_argument, NULL, 'n'}, {"progress", no_argument, NULL, 'P'}, {"quiet", no_argument, NULL, 'q'}, @@ -118,6 +120,7 @@ main(int argc, char **argv) progname = get_progname(argv[0]); memset(&context, 0, sizeof(context)); + context.format = '\0'; if (argc > 1) { @@ -154,7 +157,7 @@ main(int argc, char **argv) simple_string_list_append(&context.ignore_list, "recovery.signal"); simple_string_list_append(&context.ignore_list, "standby.signal"); - while ((c = getopt_long(argc, argv, "ei:m:nPqsw:", long_options, NULL)) != -1) + while ((c = getopt_long(argc, argv, "eF:i:m:nPqsw:", long_options, NULL)) != -1) { switch (c) { @@ -173,6 +176,15 @@ main(int argc, char **argv) manifest_path = pstrdup(optarg); canonicalize_path(manifest_path); break; + case 'F': + if (strcmp(optarg, "p") == 0 || strcmp(optarg, "plain") == 0) + context.format = 'p'; + else if (strcmp(optarg, "t") == 0 || strcmp(optarg, "tar") == 0) + context.format = 't'; + else + pg_fatal("invalid backup format \"%s\", must be \"plain\" or \"tar\"", + optarg); + break; case 'n': no_parse_wal = true; break; @@ -220,11 +232,26 @@ main(int argc, char **argv) pg_fatal("cannot specify both %s and %s", "-P/--progress", "-q/--quiet"); + /* Determine the backup format if it hasn't been specified. */ + if (context.format == '\0') + context.format = find_backup_format(&context); + /* Unless --no-parse-wal was specified, we will need pg_waldump. */ if (!no_parse_wal) { int ret; + /* + * XXX: In the future, we should consider enhancing pg_waldump to read + * WAL files from the tar archive. + */ + if (context.format != 'p') + { + pg_log_error("pg_waldump cannot read from a tar archive"); + pg_log_error_hint("You must use -n or --no-parse-wal when verifying a tar-format backup."); + exit(1); + } + pg_waldump_path = pg_malloc(MAXPGPATH); ret = find_other_exec(argv[0], "pg_waldump", "pg_waldump (PostgreSQL) " PG_VERSION "\n", @@ -279,8 +306,13 @@ main(int argc, char **argv) /* * Now do the expensive work of verifying file checksums, unless we were * told to skip it. + * + * We were only checking the plain backup here. For the tar backup, file + * checksums verification (if requested) will be done immediately when the + * file is accessed, as we don't have random access to the files like we + * do with plain backups. */ - if (!context.skip_checksums) + if (!context.skip_checksums && context.format == 'p') verify_backup_checksums(&context); /* @@ -982,6 +1014,42 @@ should_ignore_relpath(verifier_context *context, const char *relpath) return false; } +/* + * To detect the backup format, it checks for the PG_VERSION file in the backup + * directory. If found, it will be considered a plain-format backup; otherwise, + * it will be assumed to be a tar-format backup. + */ +static char +find_backup_format(verifier_context *context) +{ + char result; + char *path; + struct stat sb; + + /* Should be here only if the backup format is unknown */ + Assert(context->format == '\0'); + + /* Check PG_VERSION file. */ + path = psprintf("%s/%s", context->backup_directory, "PG_VERSION"); + if (stat(path, &sb) == 0) + result = 'p'; + else + { + if (errno != ENOENT) + { + pg_log_error("cannot determine backup format"); + pg_log_error_hint("Try \"%s --help\" for more information.", progname); + exit(1); + } + + /* Otherwise, it is assumed to be a tar format backup. */ + result = 't'; + } + pfree(path); + + return result; +} + /* * Print a progress report based on the global variables. * @@ -1039,6 +1107,7 @@ usage(void) printf(_(" -e, --exit-on-error exit immediately on error\n")); printf(_(" -i, --ignore=RELATIVE_PATH ignore indicated path\n")); printf(_(" -m, --manifest-path=PATH use specified path for manifest\n")); + printf(_(" -F, --format=p|t backup format (plain, tar)\n")); printf(_(" -n, --no-parse-wal do not try to parse WAL files\n")); printf(_(" -P, --progress show progress information\n")); printf(_(" -q, --quiet do not print any output, except for errors\n")); diff --git a/src/bin/pg_verifybackup/pg_verifybackup.h b/src/bin/pg_verifybackup/pg_verifybackup.h index 2901b00870a..d3b9f733087 100644 --- a/src/bin/pg_verifybackup/pg_verifybackup.h +++ b/src/bin/pg_verifybackup/pg_verifybackup.h @@ -100,6 +100,7 @@ typedef struct verifier_context manifest_data *manifest; char *backup_directory; SimpleStringList ignore_list; + char format; /* backup format: p(lain)/t(ar) */ bool skip_checksums; bool exit_on_error; bool saw_any_error; -- 2.18.0
From 7322f5ee8d461eb0c3d54086946850adee14a001 Mon Sep 17 00:00:00 2001 From: Amul Sul <amul.sul@enterprisedb.com> Date: Thu, 8 Aug 2024 16:01:33 +0530 Subject: [PATCH v11 09/12] Add simple_ptr_list_destroy() and simple_ptr_list_destroy_deep() API. We didn't have any helper function to destroy SimplePtrList, likely because it wasn't needed before, but it's required in a later patch in this set. I've added two functions for this purpose, inspired by list_free() and list_free_deep(). --- src/fe_utils/simple_list.c | 39 ++++++++++++++++++++++++++++++ src/include/fe_utils/simple_list.h | 2 ++ 2 files changed, 41 insertions(+) diff --git a/src/fe_utils/simple_list.c b/src/fe_utils/simple_list.c index 2d88eb54067..9d218911c31 100644 --- a/src/fe_utils/simple_list.c +++ b/src/fe_utils/simple_list.c @@ -173,3 +173,42 @@ simple_ptr_list_append(SimplePtrList *list, void *ptr) list->head = cell; list->tail = cell; } + +/* + * Destroy a pointer list and optionally the pointed-to element + */ +static void +simple_ptr_list_destroy_private(SimplePtrList *list, bool deep) +{ + SimplePtrListCell *cell; + + cell = list->head; + while (cell != NULL) + { + SimplePtrListCell *next; + + next = cell->next; + if (deep) + pg_free(cell->ptr); + pg_free(cell); + cell = next; + } +} + +/* + * Destroy a pointer list and the pointed-to element + */ +void +simple_ptr_list_destroy_deep(SimplePtrList *list) +{ + simple_ptr_list_destroy_private(list, true); +} + +/* + * Destroy only pointer list and not the pointed-to element + */ +void +simple_ptr_list_destroy(SimplePtrList *list) +{ + simple_ptr_list_destroy_private(list, false); +} diff --git a/src/include/fe_utils/simple_list.h b/src/include/fe_utils/simple_list.h index d42ecded8ed..5b7cbec8a62 100644 --- a/src/include/fe_utils/simple_list.h +++ b/src/include/fe_utils/simple_list.h @@ -66,5 +66,7 @@ extern void simple_string_list_destroy(SimpleStringList *list); extern const char *simple_string_list_not_touched(SimpleStringList *list); extern void simple_ptr_list_append(SimplePtrList *list, void *ptr); +extern void simple_ptr_list_destroy_deep(SimplePtrList *list); +extern void simple_ptr_list_destroy(SimplePtrList *list); #endif /* SIMPLE_LIST_H */ -- 2.18.0
From 792b4f4a19e458de284e062adc16caa0528539bd Mon Sep 17 00:00:00 2001 From: Amul Sul <amul.sul@enterprisedb.com> Date: Thu, 1 Aug 2024 15:47:26 +0530 Subject: [PATCH v11 08/12] Refactor: split verify_control_file. Separated the manifest entry verification code into a new function and introduced the should_verify_control_data() macro, similar to should_verify_checksum(). Like verify_file_checksum(), verify_control_file() is too design to accept the pg_control file patch which will be opened and respective information will be verified. But, in case of tar backup we would be having pg_control file contents instead, that needs to be verified in the same way. For that reason the code that doing the verification is separated into separate function to so that can be reused for the tar backup verification as well. --- src/bin/pg_verifybackup/pg_verifybackup.c | 43 +++++++++++------------ src/bin/pg_verifybackup/pg_verifybackup.h | 14 ++++++++ 2 files changed, 34 insertions(+), 23 deletions(-) diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c index e44d0377cd5..a19f344ea67 100644 --- a/src/bin/pg_verifybackup/pg_verifybackup.c +++ b/src/bin/pg_verifybackup/pg_verifybackup.c @@ -66,8 +66,6 @@ static void verify_backup_directory(verifier_context *context, char *relpath, char *fullpath); static void verify_plain_backup_file(verifier_context *context, char *relpath, char *fullpath); -static void verify_control_file(const char *controlpath, - uint64 manifest_system_identifier); static void report_extra_backup_files(verifier_context *context); static void verify_backup_checksums(verifier_context *context); static void verify_file_checksum(verifier_context *context, @@ -631,14 +629,20 @@ verify_plain_backup_file(verifier_context *context, char *relpath, /* Check the backup manifest entry for this file. */ m = verify_manifest_entry(context, relpath, sb.st_size); - /* - * Validate the manifest system identifier, not available in manifest - * version 1. - */ - if (context->manifest->version != 1 && - strcmp(relpath, "global/pg_control") == 0 && - m != NULL && m->matched && !m->bad) - verify_control_file(fullpath, context->manifest->system_identifier); + /* Validate the pg_control file information */ + if (should_verify_control_data(context->manifest, m)) + { + ControlFileData *control_file; + bool crc_ok; + + pg_log_debug("reading \"%s\"", fullpath); + control_file = get_controlfile_by_exact_path(fullpath, &crc_ok); + + verify_control_data(control_file, fullpath, crc_ok, + context->manifest->system_identifier); + /* Release memory. */ + pfree(control_file); + } /* Update statistics for progress report, if necessary */ if (show_progress && !context->skip_checksums && @@ -687,18 +691,14 @@ verify_manifest_entry(verifier_context *context, char *relpath, int64 filesize) } /* - * Sanity check control file and validate system identifier against manifest - * system identifier. + * Sanity check control file data and validate system identifier against + * manifest system identifier. */ -static void -verify_control_file(const char *controlpath, uint64 manifest_system_identifier) +void +verify_control_data(ControlFileData *control_file, + const char *controlpath, bool crc_ok, + uint64 manifest_system_identifier) { - ControlFileData *control_file; - bool crc_ok; - - pg_log_debug("reading \"%s\"", controlpath); - control_file = get_controlfile_by_exact_path(controlpath, &crc_ok); - /* Control file contents not meaningful if CRC is bad. */ if (!crc_ok) report_fatal_error("%s: CRC is incorrect", controlpath); @@ -714,9 +714,6 @@ verify_control_file(const char *controlpath, uint64 manifest_system_identifier) controlpath, (unsigned long long) manifest_system_identifier, (unsigned long long) control_file->system_identifier); - - /* Release memory. */ - pfree(control_file); } /* diff --git a/src/bin/pg_verifybackup/pg_verifybackup.h b/src/bin/pg_verifybackup/pg_verifybackup.h index fe0ce8a89aa..2901b00870a 100644 --- a/src/bin/pg_verifybackup/pg_verifybackup.h +++ b/src/bin/pg_verifybackup/pg_verifybackup.h @@ -40,6 +40,17 @@ typedef struct manifest_file (((m) != NULL) && ((m)->matched) && !((m)->bad) && \ (((m)->checksum_type) != CHECKSUM_TYPE_NONE)) +/* + * Validate the control file and its system identifier against the manifest + * system identifier. Note that this feature is not available in manifest + * version 1. This validation should only be performed if the manifest entry + * validation has been completed without errors. + */ +#define should_verify_control_data(manifest, m) \ + (((manifest)->version != 1) && \ + ((m) != NULL) && ((m)->matched) && !((m)->bad) && \ + (strcmp((m)->pathname, "global/pg_control") == 0)) + /* * Define a hash table which we can use to store information about the files * mentioned in the backup manifest. @@ -99,6 +110,9 @@ extern manifest_file *verify_manifest_entry(verifier_context *context, extern void verify_checksum(verifier_context *context, manifest_file *m, pg_checksum_context *checksum_ctx, int64 bytes_read); +extern void verify_control_data(ControlFileData *control_file, + const char *controlpath, bool crc_ok, + uint64 manifest_system_identifier); extern void report_backup_error(verifier_context *context, const char *pg_restrict fmt,...) -- 2.18.0
v11-0007-Refactor-split-verify_file_checksum-function.patch
Description: Binary data
v11-0006-Refactor-split-verify_backup_file-function-and-r.patch
Description: Binary data