I have also done a review of the patch and some testing. The patch looks good, and I agree with Robert's comments.
On Wed, Jan 17, 2024 at 8:40 PM Robert Haas <robertmh...@gmail.com> wrote: > > On Wed, Jan 17, 2024 at 6:31 AM Amul Sul <sula...@gmail.com> wrote: > > With the attached patch, the backup manifest will have a new key item as > > "System-Identifier" 64-bit integer whose value is derived from pg_control while > > generating it, and the manifest version bumps to 2. > > > > This helps to identify the correct database server and/or backup for the > > subsequent backup operations. pg_verifybackup validates the manifest system > > identifier against the backup control file and fails if they don’t match. > > Similarly, pg_basebackup increment backup will fail if the manifest system > > identifier does not match with the server system identifier. The > > pg_combinebackup is already a bit smarter -- checks the system identifier from > > the pg_control of all the backups, with this patch the manifest system > > identifier also validated. > > Thanks for working on this. Without this, I think what happens is that > you can potentially take an incremental backup from the "wrong" > server, if the states of the systems are such that all of the other > sanity checks pass. When you run pg_combinebackup, it'll discover the > problem and tell you, but you ideally want to discover such errors at > backup time rather than at restore time. This addresses that. And, > overall, I think it's a pretty good patch. But I nonetheless have a > bunch of comments. > > - The associated value is always the integer 1. > + The associated value is the integer, either 1 or 2. > > is an integer. Beginning in <productname>PostgreSQL</productname> 17, > it is 2; in older versions, it is 1. > > + context.identity_cb = manifest_process_identity; > > I'm not really on board with calling the system identifier "identity" > throughout the patch. I think it should just say system_identifier. If > we were going to abbreviate, I'd prefer something like "sysident" that > looks like it's derived from "system identifier" rather than > "identity" which is a different word altogether. But I don't think we > should abbreviate unless doing so creates *ridiculously* long > identifier names. > > +static void > +manifest_process_identity(JsonManifestParseContext *context, > + int manifest_version, > + uint64 manifest_system_identifier) > +{ > + uint64 system_identifier; > + > + /* Manifest system identifier available in version 2 or later */ > + if (manifest_version == 1) > + return; > > I think you've got the wrong idea here. I think this function would > only get called if System-Identifier is present in the manifest, so if > it's a v1 manifest, this would never get called, so this if-statement > would not ever do anything useful. I think what you should do is (1) > if the client supplies a v1 manifest, reject it, because surely that's > from an older server version that doesn't support incremental backup; > but do that when the version is parsed rather than here; and (2) also > detect and reject the case when it's supposedly a v2 manifest but this > is absent. > > (1) should really be done when the version number is parsed, so I > suspect you may need to add manifest_version_cb. > > +static void > +combinebackup_identity_cb(JsonManifestParseContext *context, > + int manifest_version, > + uint64 manifest_system_identifier) > +{ > + parser_context *private_context = context->private_data; > + uint64 system_identifier = private_context->system_identifier; > + > + /* Manifest system identifier available in version 2 or later */ > + if (manifest_version == 1) > + return; > > Very similar to the above case. Just reject a version 1 manifest as > soon as we see the version number. In this function, set a flag > indicating we saw the system identifier; if at the end of parsing that > flag is not set, kaboom. > > - parse_manifest_file(manifest_path, &context.ht, &first_wal_range); > + parse_manifest_file(manifest_path, &context.ht, &first_wal_range, > + context.backup_directory); > > Don't do this! parse_manifest_file() should just record everything > found in the manifest in the context object. Syntax validation should > happen while parsing the manifest (e.g. "CAT/DOG" is not a valid LSN > and we should reject that at this stage) but semantic validation > should happen later (e.g. "0/0" can't be a the correct backup end LSN > but we don't figure that out while parsing, but rather later). I think > you should actually move validation of the system identifier to the > point where the directory walk encounters the control file (and update > the docs and tests to match that decision). Imagine if you wanted to > validate a tar-format backup; then you wouldn't have random access to > the directory. You'd see the manifest file first, and then all the > files in a random order, with one chance to look at each one. > > (This is, in fact, a feature I think we should implement.) > > - if (strcmp(token, "1") != 0) > + parse->manifest_version = atoi(token); > + if (parse->manifest_version != 1 && parse->manifest_version != 2) > json_manifest_parse_failure(parse->context, > "unexpected manifest version"); > > Please either (a) don't do a string-to-integer conversion and just > strcmp() twice or (b) use strtol so that you can check that it > succeeded. I don't want to accept manifest version 1a as 1. > > > +/* > > + * Validate manifest system identifier against the database server > system > > + * identifier. > > + */ > > > > This comment assumes you know what the callback is going to do, but > > you don't. This should be more like the comment for > > json_manifest_finalize_file or json_manifest_finalize_wal_range. This comment caught me off-guard too. After some testing and detailed review I found that this is called by pg_verifybackup and pg_combinebackup both of which do not validate against any running database system. > > -- > Robert Haas > EDB: http://www.enterprisedb.com > > -- Thanks & Regards, Sravan Velagandula EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company