Hi Amul, thanks for working on this. + file_name_len = strlen(relpath); > + if (file_name_len < file_extn_len || > + strcmp(relpath + file_name_len - file_extn_len, file_extn) != 0) > + { > + if (compress_algorithm == PG_COMPRESSION_NONE) > + report_backup_error(context, > + "\"%s\" is not a valid file, expecting tar file", > + relpath); > + else > + report_backup_error(context, > + "\"%s\" is not a valid file, expecting \"%s\" compressed tar file", > + relpath, > + get_compress_algorithm_name(compress_algorithm)); > + return; > + } >
I believe pg_verifybackup needs to exit after reporting a failure here since it could not figure out a streamer to allocate. Also, v1-0002 removes #include "pqexpbuffer.h" from astreamer.h and adds it to the new .h file and in v1-0004 it reverts the change. So this can be avoided altogether. On Tue, Jul 9, 2024 at 3:24 PM Amul Sul <sula...@gmail.com> wrote: > Hi, > > Currently, pg_verifybackup only works with plain (directory) format > backups. > This proposal aims to support tar-format backups too. We will read the tar > files from start to finish and verify each file inside against the > backup_manifest information, similar to how it verifies plain files. > > We are introducing new options to pg_verifybackup: -F, --format=p|t and -Z, > --compress=METHOD, which allow users to specify the backup format and > compression type, similar to the options available in pg_basebackup. If > these > options are not provided, the backup format and compression type will be > automatically detected. To determine the format, we will search for > PG_VERSION > file in the backup directory — if found, it indicates a plain backup; > otherwise, it > is a tar-format backup. For the compression type, we will check the > extension > of base.tar.xxx file of tar-format backup. Refer to patch 0008 for the > details. > > The main challenge is to structure the code neatly. For plain-format > backups, > we verify bytes directly from the files. For tar-format backups, we read > bytes > from the tar file of the specific file we care about. We need an > abstraction > to handle both formats smoothly, without using many if statements or > special > cases. > > To achieve this goal, we need to reuse existing infrastructure without > duplicating code, and for that, the major work involved here is the code > refactoring. Here is a breakdown of the work: > > 1. BBSTREAMER Rename and Relocate: > BBSTREAMER, currently used in pg_basebackup for reading and decompressing > TAR > files; can also be used for pg_verifybackup. In the future, it could > support > other tools like pg_combinebackup for merging TAR backups without > extraction, > and pg_waldump for verifying WAL files from the tar backup. For that > accessibility, > BBSTREAMER needs to be relocated to a shared directory. > > Moreover, renaming BBSTREAMER to ASTREAMER (short for Archive Streamer) > would > better indicate its general application across multiple tools. Moving it to > src/fe_utils directory is appropriate, given its frontend infrastructure > use. > > 2. pg_verifybackup Code Refactoring: > The existing code for plain backup verification will be split into separate > files or functions, so it can also be reused for tar backup verification. > > 3. Adding TAR Backup Verification: > Finally, patches will be added to implement TAR backup verification, along > with > tests and documentation. > > Patches 0001-0003 focus on renaming and relocating BBSTREAMER, patches > 0004-0007 on splitting the existing verification code, and patches > 0008-0010 on > adding TAR backup verification capabilities, tests, and documentation. The > last > set could be a single patch but is split to make the review easier. > > Please take a look at the attached patches and share your comments, > suggestions, or any ways to enhance them. Your feedback is greatly > appreciated. > > Thank you ! > > -- > Regards, > Amul Sul > EDB: http://www.enterprisedb.com > -- Thanks & Regards, Sravan Velagandula EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company