On Fri, Dec 20, 2019 at 9:14 PM Robert Haas <robertmh...@gmail.com> wrote:
> On Fri, Dec 20, 2019 at 8:24 AM Suraj Kharage > <suraj.khar...@enterprisedb.com> wrote: > > Thank you for review comments. > > Thanks for the new version. > > + <term><option>--verify-backup </option></term> > > Whitespace. > > +struct manifesthash_hash *hashtab; > > Uh, I had it in mind that you would nuke this line completely, not > just remove "typedef" from it. You shouldn't need a global variable > here. > > + if (buf == NULL) > > pg_malloc seems to have an internal check such that it never returns > NULL. I don't see anything like this test in other callers. > > The order of operations in create_manifest_hash() seems unusual: > > + fd = open(manifest_path, O_RDONLY, 0); > + if (fstat(fd, &stat)) > + buf = pg_malloc(stat.st_size); > + hashtab = manifesthash_create(1024, NULL); > ... > + entry = manifesthash_insert(hashtab, filename, &found); > ... > + close(fd); > > I would have expected open-fstat-read-close to be consecutive, and the > manifesthash stuff all done afterwards. In fact, it seems like reading > the file could be a separate function. > > + if (strncmp(checksum, "SHA256", 6) == 0) > > This isn't really right; it would give a false match if we had a > checksum algorithm with a name like SHA2560 or SHA256C or > SHA256ExceptWayBetter. The right thing to do is find the colon first, > and then probably overwrite it with '\0' so that you have a string > that you can pass to parse_checksum_algorithm(). > > + /* > + * we don't have checksum type in the header, so need to > + * read through the first file enttry to find the checksum > + * type for the manifest file and initilize the checksum > + * for the manifest file itself. > + */ > > This seems to be proceeding on the assumption that the checksum type > for the manifest itself will always be the same as the checksum type > for the first file in the manifest. I don't think that's the right > approach. I think the manifest should always have a SHA256 checksum, > regardless of what type of checksum is used for the individual files > within the manifest. Since the volume of data in the manifest is > presumably very small compared to the size of the database cluster > itself, I don't think there should be any performance problem there. > Agree, that performance won't be a problem, but that will be bit confusing to the user. As at the start user providing the manifest-checksum (assume that user-provided CRC32C) and at the end, user will find the SHA256 checksum string in the backup_manifest file. Does this also means that irrespective of whether user provided a checksum option or not, we will be always generating the checksum for the backup_manifest file? > + filesize = atol(size); > > Using strtol() would allow for some error checking. > > + * Increase the checksum by its lable length so that we can > + checksum = checksum + checksum_lable_length; > > Spelling. > > + pg_log_error("invalid record found in \"%s\"", manifest_path); > > Error message needs work. > > +VerifyBackup(void) > +create_manifest_hash(char *manifest_path) > +nextLine(char *buf) > > Your function names should be consistent with the surrounding style, > and with each other, as far as possible. Three different conventions > within the same patch and source file seems over the top. > > Also keep in mind that you're not writing code in a vacuum. There's a > whole file of code here, and around that, a whole project. > scan_data_directory() is a good example of a function whose name is > clearly too generic. It's not a general-purpose function for scanning > the data directory; it's specifically a support function for verifying > a backup. Yet, the name gives no hint of this. > > +verify_file(struct dirent *de, char fn[MAXPGPATH], struct stat st, > + char relative_path[MAXPGPATH], manifesthash_hash *hashtab) > > I think I commented on the use of char[] parameters in my previous review. > > + /* Skip backup manifest file. */ > + if (strcmp(de->d_name, "backup_manifest") == 0) > + return; > > Still looks like this will be skipped at any level of the directory > hierarchy, not just the top. And why are we skipping backup_manifest > here bug pg_wal in scan_data_directory? That's a rhetorical question, > because I know the answer: verify_file() is only getting called for > files, so you can't use it to skip directories. But that's not a good > excuse for putting closely-related checks in different parts of the > code. It's just going to result in the checks being inconsistent and > each one having its own bugs that have to be fixed separately from the > other one, as here. Please try to reorganize this code so that it can > be done in a consistent way. > > I think this is related to the way you're traversing the directory > tree, which somehow looks a bit awkward to me. At the top of > scan_data_directory(), you've got code that uses basedir and > subdirpath to construct path and relative_path. I was initially > surprised to see that this was the job of this function, rather than > the caller, but then I thought: well, as long as it makes life easy > for the caller, it's probably fine. However, I notice that the only > non-trivial caller is the scan_data_directory() itself, and it has to > go and construct newsubdirpath from subdirpath and the directory name. > > It seems to me that this would get easier if you defined > scan_data_directory() -- or whatever we end up calling it -- to take > two pathname-related arguments: > > - basepath, which would be $PGDATA and would never change as we > recurse down, so same as what you're now calling basedir > - pathsuffix, which would be an empty string at the top level and at > each recursive level we'd add a slash and then de->d_name. > > So at the top of the function we wouldn't need an if statement, > because you could just do: > > snprintf(path, MAXPGPATH, "%s%s", basedir, pathsuffix); > > And when you recurse you wouldn't need an if statement either, because > you could just do: > > snprintf(newpathsuffix, MAXPGPATH, "%s/%s", pathsuffix, de->d_name); > > What I'd suggest is constructing newpathsuffix right after rejecting > "." and ".." entries, and then you can reject both pg_wal and > backup_manifest, at the top-level only, using symmetric and elegant > code: > > if (strcmp(newpathsuffix, "/pg_wal") == 0 || strcmp(newpathsuffix, > "/backup_manifest") == 0) > continue; > > + record = manifesthash_lookup(hashtab, filename);; > + if (record) > + { > ...long block... > + } > + else > + pg_log_info("file \"%s\" is present in backup but not in manifest", > + filename); > > Try to structure the code in such a way that you minimize unnecessary > indentation. For example, in this case, you could instead write: > > if (record == NULL) > { > pg_log_info(...) > return; > } > > and the result would be that everything inside that long if-block is > now at the top level of the function and indented one level less. And > I think if you look at this function you'll see a way that you can > save a *second* level of indentation for much of that code. Please > check the rest of the patch for similar cases, too. > > +static char * > +nextLine(char *buf) > +{ > + while (*buf != '\0' && *buf != '\n') > + buf = buf + 1; > + > + return buf + 1; > +} > > I'm pretty sure that my previous review mentioned the importance of > protecting against buffer overruns here. > > +static char * > +nextWord(char *line) > +{ > + while (*line != '\0' && *line != '\t' && *line != '\n') > + line = line + 1; > + > + return line + 1; > +} > > Same problem here. > > In both cases, ++ is more idiomatic. > > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company > -- Rushabh Lathia