On Wed, Mar 25, 2020 at 4:54 PM Stephen Frost <sfr...@snowman.net> wrote: > > That's a fair argument, but I think the other relevant principle is > > that we try to give people useful defaults for things. I think that > > checksums are a sufficiently useful thing that having the default be > > not to do it doesn't make sense. I had the impression that you and > > David were in agreement on that point, actually. > > I agree with wanting to have useful defaults and that checksums should > be included by default, and I'm alright even with letting people pick > what algorithms they'd like to have too. The construct here is made odd > because we've got this idea that "no checksum" is an option, which is > actually something that I don't particularly like, but that's what's > making this particular syntax weird. I don't suppose you'd be open to > the idea of just dropping that though..? There wouldn't be any issue > with this syntax if we just always had checksums included when a > manifest is requested. :) > > Somehow, I don't think I'm going to win that argument.
Well, it's not a crazy idea. So, at some point, I had the idea that you were always going to get a manifest, and therefore you should at least ought to have the option of not checksumming to avoid the overhead. But, as things stand now, you can suppress the manifest altogether, so that you can still take a backup even if you've got no disk space to spool the manifest on the master. So, if you really want no overhead from manifests, just don't have a manifest. And if you are OK with some overhead, why not at least have a CRC-32C checksum, which is, after all, pretty cheap? Now, on the other hand, I don't have any strong evidence that the manifest-without-checksums mode is useless. You can still use it to verify that you have the correct files and that those files have the expected sizes. And, verifying those things is very cheap, because you only need to stat() each file, not open and read them all. True, you can do those things by using pg_validatebackup -s. But, you'd still incur the (admittedly fairly low) overhead of computing checksums that you don't intend to use. This is where I feel like I'm trying to make decisions in a vacuum. If we had a few more people weighing in on the thread on this point, I'd be happy to go with whatever the consensus was. If most people think having both --no-manifest (suppressing the manifest completely) and --manifest-checksums=none (suppressing only the checksums) is useless and confusing, then sure, let's rip the latter one out. If most people like the flexibility, let's keep it: it's already implemented and tested. But I hate to base the decision on what one or two people think. > > Well, the 512MB "limit" for CRC-32C means only that for certain very > > specific types of errors, detection is not guaranteed above that file > > size. So if you have a single flipped bit, for example, and the file > > size is greater than 512MB, then CRC-32C has only a 99.9999999767169% > > chance of detecting the error, whereas if the file size is less than > > 512MB, it is 100% certain, because of the design of the algorithm. But > > nine nines is plenty, and neither SHA nor our page-level checksums > > provide guaranteed error detection properties anyway. > > Right, so we know that CRC-32C has an upper-bound of 512MB to be useful > for exactly what it's designed to be useful for, but we also know that > we're going to have larger files- at least 1GB ones, and quite possibly > larger, so why are we choosing this? > > At the least, wouldn't it make sense to consider a larger CRC, one whose > limit is above the size of commonly expected files, if we're going to > use a CRC? I mean, you're just repeating the same argument here, and it's just not valid. Regardless of the file size, the chances of a false checksum match are literally less than one in a billion. There is every reason to believe that users will be happy with a low-overhead method that has a 99.9999999+% chance of detecting corrupt files. I do agree that a 64-bit CRC would probably be not much more expensive and improve the probability of detecting errors even further, but I wanted to restrict this patch to using infrastructure we already have. The choices there are the various SHA functions (so I supported those), MD5 (which I deliberately omitted, for reasons I hope you'll be the first to agree with), CRC-32C (which is fast), a couple of other CRC-32 variants (which I omitted because they seemed redundant and one of them only ever existed in PostgreSQL because of a coding mistake), and the hacked-up version of FNV that we use for page-level checksums (which is only 16 bits and seems to have no advantages for this purpose). > > I'm not sure why the fact that it's a 40-year-old algorithm is > > relevant. There are many 40-year-old algorithms that are very good. > > Sure there are, but there probably wasn't a lot of thought about > GB-sized files, and this doesn't really seem to be the direction people > are going in for larger objects. s3, as an example, uses sha256. > Google, it seems, suggests folks use "HighwayHash" (from their crc32c > github repo- https://github.com/google/crc32c). Most CRC uses seem to > be for much smaller data sets. Again, I really want to stick with infrastructure we already have. Trying to find a hash function that will please everybody is a hole with no bottom, or more to the point, a bikeshed in need of painting. There are TONS of great hash functions out there on the Internet, and as previous discussions of pgsql-hackers will attest, as soon as you go down that road, somebody will say "well, what about xxhash" or whatever, and then you spend the rest of your life trying to figure out what hash function we could try to commit that is fast and secure and doesn't have copyright or patent problems. There have been multiple efforts to introduce such hash functions in the past, and I think basically all of those have crashed into a brick wall. I don't think that's because introducing new hash functions is a bad idea. I think that there are various reasons why it might be a good idea. For instance, highwayhash purports to be a cryptographic hash function that is fast enough to replace non-cryptographic hash functions. It's easy to see why someone might want that, here. For example, it would be entirely reasonable to copy the backup manifest onto a USB key and store it in a vault. Later, if you get the USB key back out of the vault and validate it against the backup, you pretty much know that none of the data files have been tampered with, provided that you used a cryptographic hash. So, SHA is a good option for people who have a USB key and a vault, and a faster cryptographic might be even better. I don't have any desire to block such proposals, and I would be thrilled if this work inspires other people to add such options. However, I also don't want this patch to get blocked by an interminable argument about which hash functions we ought to use. The ones we have in core now are good enough for a start, and more can be added later. > Sure, there's a good chance we'll need newer algorithms in the future, I > don't doubt that. On the other hand, if crc32c, or CRC whatever, was > the perfect answer and no one will ever need something better, then > what's with folks like Google suggesting something else..? I have never said that CRC was the perfect answer, and the reason why Google is suggesting something different is because they wanted a fast hash (not SHA) that still has cryptographic properties. What I have said is that using CRC-32C by default means that there is very little downside as compared with current releases. Backups will not get slower, and error detection will get better. If you pick any other default from the menu of options currently available, then either backups get noticeably slower, or we get less error detection capability than that option gives us. > As for folks who are that close to the edge on their backup timing that > they can't have it slow down- chances are pretty darn good that they're > not far from ending up needing to find a better solution than > pg_basebackup anyway. Or they don't need to generate a manifest (or, I > suppose, they could have one but not have checksums..). 40-50% is a lot more than "if you were on the edge." > > Well, that'd be wrong, though. It's true that backup_manifest won't > > have an entry in the manifest, and neither will WAL files, but > > postgresql.auto.conf will. We'll just skip complaining about it if the > > checksum doesn't match or whatever. The server generates manifest > > entries for everything, and the client decides not to pay attention to > > some of them because it knows that pg_basebackup may have made certain > > changes that were not known to the server. > > Ok, but it's also wrong to say that the backup_label is excluded from > verification. The docs don't say that backup_label is excluded from verification. They do say that backup_manifest is excluded from verification *against the manifest*, because it is. I'm not sure if you're honestly confused here or if we're just devolving into arguing for the sake of argument, but right now the code looks like this: simple_string_list_append(&context.ignore_list, "backup_manifest"); simple_string_list_append(&context.ignore_list, "pg_wal"); simple_string_list_append(&context.ignore_list, "postgresql.auto.conf"); simple_string_list_append(&context.ignore_list, "recovery.signal"); simple_string_list_append(&context.ignore_list, "standby.signal"); Notice that this is the same list of files mentioned in the documentation. Now let's suppose we remove the first of those lines of code, so that backup_manifest is not in the exclude list by default. Now let's try to validate a backup: [rhaas pgsql]$ src/bin/pg_validatebackup/pg_validatebackup ~/pgslave pg_validatebackup: error: "backup_manifest" is present on disk but not in the manifest Oops. If you read that error carefully, you can see that the complaint is 100% valid. backup_manifest is indeed present on disk, but not in the manifest. However, because this situation is expected and known not to be a problem, the right thing to do is suppress the error. That is why it is in the ignore_list by default. The documentation is attempting to explain this. If it's unclear, we should try to make it better, but it is absolutely NOT saying that there is no internal validation of the backup_manifest. In fact, the previous paragraph tries to explain that: + <application>pg_validatebackup</application> reads the manifest file of a + backup, verifies the manifest against its own internal checksum, and then It is, however, saying, and *entirely correctly*, that pg_validatebackup will not check the backup_manifest file against the backup_manifest. If it did, it would find that it's not there. It would then emit an error message like the one above even though there's no problem with the backup. > I fail to see the usefulness of a tool that doesn't actually verify that > the backup is able to be restored from. > > Even pg_basebackup (in both fetch and stream modes...) checks that we at > least got all the WAL that's needed for the backup from the server > before considering the backup to be valid and telling the user that > there was a successful backup. With what you're proposing here, we > could have someone do a pg_basebackup, get back an ERROR saying the > backup wasn't valid, and then run pg_validatebackup and be told that the > backup is valid. I don't get how that's sensible. I'm sorry that you can't see how that's sensible, but it doesn't mean that it isn't sensible. It is totally unrealistic to expect that any backup verification tool can verify that you won't get an error when trying to use the backup. That would require that everything that the validation tool try to do everything that PostgreSQL will try to do when the backup is used, including running recovery and updating the data files. Anything less than that creates a real possibility that the backup will verify good but fail when used. This tool has a much narrower purpose, which is to try to verify that we (still) have the files the server sent as part of the backup and that, to the best of our ability to detect such things, they have not been modified. As you know, or should know, the WAL files are not sent as part of the backup, and so are not verified. Other things that would also be useful to check are also not verified. It would be fantastic to have more verification tools in the future, but it is difficult to see why anyone would bother trying if an attempt to get the first one committed gets blocked because it does not yet do everything. Very few patches try to do everything, and those that do usually get blocked because, by trying to do too much, they get some of it badly wrong. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company