On Fri, Apr 19, 2024 at 11:47 AM Robert Haas <robertmh...@gmail.com> wrote: > Hmm, that's an interesting perspective. I've always been very > skeptical of doing verification only around missing files and not > anything else. I figured that wouldn't be particularly meaningful, and > that's pretty much the only kind of validation that's even > theoretically possible without a bunch of extra overhead, since we > compute checksums on entire files rather than, say, individual blocks. > And you could really only do it for the final backup in the chain, > because you should end up accessing all of those files, but the same > is not true for the predecessor backups. So it's a very weak form of > verification. > > But I looked into it and I think you're correct that, if you restrict > the scope in the way that you suggest, we can do it without much > additional code, or much additional run-time. The cost is basically > that, instead of only looking for a backup_manifest entry when we > think we can reuse its checksum, we need to do a lookup for every > single file in the final input directory. Then, after processing all > such files, we need to iterate over the hash table one more time and > see what files were never touched. That seems like an acceptably low > cost to me. So, here's a patch. > > I do think there's some chance that this will encourage people to > believe that pg_combinebackup is better at finding problems than it > really is or ever will be, and I also question whether it's right to > keep changing stuff after feature freeze. But I have a feeling most > people here are going to think this is worth including in 17. Let's > see what others say.
There was no hue and cry to include this in v17 and I think that ship has sailed at this point, but we could still choose to include this as an enhancement for v18 if people want it. I think David's probably in favor of that (but I'm not 100% sure) and I have mixed feelings about it (explained above) so what I'd really like is some other opinions on whether this idea is good, bad, or indifferent. Here is a rebased version of the patch. No other changes since v1. -- Robert Haas EDB: http://www.enterprisedb.com
v2-0001-pg_combinebackup-Detect-missing-files-when-possib.patch
Description: Binary data