On 8 March 2013 03:31, Bruce Momjian <br...@momjian.us> wrote: > I also see the checksum patch is taking a beating. I wanted to step > back and ask what percentage of known corruptions cases will this > checksum patch detect? What percentage of these corruptions would > filesystem checksums have detected? > > Also, don't all modern storage drives have built-in checksums, and > report problems to the system administrator? Does smartctl help report > storage corruption? > > Let me take a guess at answering this --- we have several layers in a > database server: > > 1 storage > 2 storage controller > 3 file system > 4 RAM > 5 CPU > > My guess is that storage checksums only cover layer 1, while our patch > covers layers 1-3, and probably not 4-5 because we only compute the > checksum on write. > > If that is correct, the open question is what percentage of corruption > happens in layers 1-3?
Yes, checksums patch is taking a beating, and so it should. If we find a reason to reject, we should. CPU and RAM error checking are pretty standard now. Storage isn't necessarily the same. The figures we had from the Google paper early in development showed it was worth checksumming storage, but not memory. I did originally argue for memory also, but there was insufficient evidence of utility. At the moment, we only reject blocks if the header is damaged. That covers basic sanity checks on about 10 bytes near the start of every block. Given that some errors might still be allowed through, lets say that covers just 8 bytes of the block. Checksums cover the whole block and detect most errors, >99.999%. Which means that we will detect errors on 8192 bytes of the block. Which means that checksums are approximately 1000 times better at spotting corruption than not using them. Or put it another way, if you don't use checksums, by the time you see a single corrupt block header you will on average have lost about 500 blocks/4MB of user data. That doesn't sound too bad, but if your database has been giving wrong answers during the period those blocks went bad, you could be looking at a significant number of reads/writes gone bad, since updates would spread corruption to other rows and data would be retrieved incorrectly over a long period. I agree with Robert's comments. This isn't a brilliant design, its a brilliant stop-gap until we get a better design. However, that is a whole chunk of work away, with pg_upgrade handling on-disk page rewrites, plus some as yet undecided redesign of the way hint bits work. It's a long way off. There are performance wrinkles also, no question. For some applications, not losing data is worth the hit. Given the patch offers choice to users, I think its acceptable to look towards committing it. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers