On Sun, Dec 18, 2011 at 11:21 PM, Josh Berkus <j...@agliodbs.com> wrote: > On 12/18/11 5:55 PM, Greg Stark wrote: >> There is another way to look at this problem. Perhaps it's worth >> having a checksum *even if* there are ways for the checksum to be >> spuriously wrong. Obviously having an invalid checksum can't be a >> fatal error then but it might still be useful information. Rright now >> people don't really know if their system can experience torn pages or >> not and having some way of detecting them could be useful. And if you >> have other unexplained symptoms then having checksum errors might be >> enough evidence that the investigation should start with the hardware >> and get the sysadmin looking at hardware logs and running memtest >> sooner. > > Frankly, if I had torn pages, even if it was just hint bits missing, I > would want that to be logged. That's expected if you crash, but if you > start seeing bad CRC warnings when you haven't had a crash? That means > you have a HW problem. > > As long as the CRC checks are by default warnings, then I don't see a > problem with this; it's certainly better than what we have now.
But the scary part is you don't know how long *ago* the crash was. Because a hint-bit-only change w/ a torn-page is a "non event" in PostgreSQL *DESIGN*, on crash recovery, it doesn't do anything to try and "scrub" every page in the database. So you could have a crash, then a recovery, and a couple clean shutdown-restart combinations before you happen to read the "needed" page that was torn in the crash $X [ days | weeks | months ] ago. It's specifically because PostgreSQL was *DESIGNED* to make torn pages a non-event (because WAL/FPW fixes anything that's dangerous), that the whole CRC issue is so complicated... I'll through out a few random thoughts (some repeated) that people who really want the CRC can fight over: 1) Find a way to not bother writing out hint-bit-only-dirty pages.... I know people like Kevin keep recommending a vacuum freeze after a big load to avoid later problems anyways and I think that's probably common in big OLAP shops, and OLTP people are likely to have real changes on the page anyways. Does anybody want to try and measure what type of performance trade-offs we'ld really have on a variety of "normal" (ya, I know, what's normal) workloads? If the page has a real change, it's got a WAL FPW, so we avoid the problem.... 2) If the writer/checksummer knows it's a hint-bit-only-dirty page, can it stuff a "cookie" checksum in it and not bother verifying? Looses a bit of the CRC guarentee, especially around "crashes" which is when we expect a torn page, but avoids the whole "scary! scary! Your database is corrupt!" false-positives in the situation PostgreSQL was specifically desinged to make not scary. #) Anybody investigated putting the CRC in a relation fork, but not right in the data block? If the CRC contains a timestamp, and is WAL logged before the write, at least on reading a block with a wrong checksum, if a warning is emitted, the timestamp could be looked at by whoever is reading the warning and know tht the block was written shortly before the crash $X $PERIODS ago.... The whole "CRC is only a warning" because we "expect to get them if we ever crashed" means that the time when we most want them, we have to assume they are bogus... And to make matters worse, we don't even know when the perioud of "they may be bugus" ends, unless we have a way to methodically force PG through ever buffer in the database after the crash... And then that makes them very hard to consider useful... a. -- Aidan Van Dyk Create like a god, ai...@highrise.ca command like a king, http://www.highrise.ca/ work like a slave. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers