On 11/22/19 2:01 PM, Robert Haas wrote: > On Fri, Nov 22, 2019 at 1:10 PM David Steele <da...@pgmasters.net> wrote: >> Well, the maximum amount of data that can be protected with a 32-bit CRC >> is 512MB according to all the sources I found (NIST, Wikipedia, etc). I >> presume that's what we are talking about since I can't find any 64-bit >> CRC code in core or this patch. > > Could you give a more precise citation for this?
See: https://www.nist.gov/system/files/documents/2017/04/26/lrdc_systems_part2_032713.pdf Search for "The maximum block size" https://en.wikipedia.org/wiki/Cyclic_redundancy_check "The design of the CRC polynomial depends on the maximum total length of the block to be protected (data + CRC bits)", which I took to mean there are limits. Here another interesting bit from: https://en.wikipedia.org/wiki/Mathematics_of_cyclic_redundancy_checks "Because a CRC is based on division, no polynomial can detect errors consisting of a string of zeroes prepended to the data, or of missing leading zeroes" -- but it appears to matter what CRC you are using. There's a variation that works in this case and hopefully we are using that one. This paper talks about appropriate block lengths vs crc length: http://users.ece.cmu.edu/~koopman/roses/dsn04/koopman04_crc_poly_embedded.pdf but it is concerned with network transmission and small block lengths. > "Typically an n-bit CRC applied to a data block of arbitrary length > will detect any single error burst not longer than n bits, and the > fraction of all longer error bursts that it will detect is (1 − > 2^−n)." I'm not sure how encouraging I find this -- a four-byte error not a lot and 2^32 is only 4 billion. We have individual users who have backed up more than 4 billion files over the last few years. >> This is the basic premise of what we call delta restore which can speed >> up restores by orders of magnitude. >> >> Delta restore is the main advantage that made us decide to require SHA1 >> checksums. In most cases, restore speed is more important than backup >> speed. > > I see your point, but it's not the whole story. We've encountered a > bunch of cases where the time it took to complete a backup exceeded > the user's desired backup interval, which is obviously very bad, or > even more commonly where it exceeded the length of the user's > "low-usage" period when they could tolerate the extra overhead imposed > by the backup. A few percentage points is probably not a big deal, but > a user who has an 8-hour window to get the backup done overnight will > not be happy if it's taking 6 hours now and we tack 40%-50% on to > that. So I think that we either have to disable backup checksums by > default, or figure out a way to get the overhead down to something a > lot smaller than what current tests are showing -- which we could > possibly do without changing the algorithm if we can somehow make it a > lot cheaper, but otherwise I think the choice is between disabling the > functionality altogether by default and adopting a less-expensive > algorithm. Maybe someday when delta restore is in core and widely used > and CPUs are faster, it'll make sense to revise the default, and > that's cool, but I can't see imposing a big overhead by default to > enable a feature core doesn't have yet... OK, I'll buy that. But I *don't* think CRCs should be allowed for deltas (when we have them) and I *do* think we should caveat their effectiveness (assuming we can agree on them). In general the answer to faster backups should be more cores/faster network/faster disk, not compromising backup integrity. I understand we'll need to wait until we have parallelism in pg_basebackup to justify that answer. Regards, -- -David da...@pgmasters.net