Greetings, * Ants Aasma (ants.aa...@eesti.ee) wrote: > On Tue, Jan 24, 2017 at 4:07 AM, Tom Lane <t...@sss.pgh.pa.us> wrote: > > Peter Geoghegan <p...@heroku.com> writes: > >> I thought that checksums went in in part because we thought that there > >> was some chance that they'd find bugs in Postgres. > > > > Not really. AFAICS the only point is to catch storage-system malfeasance. > > This matches my understanding. Actual physical media errors are caught > by lower level checksums/error correction codes, and memory errors are > caught by ECC RAM.
Not everyone runs with ECC, sadly. > Checksums do very little for PostgreSQL bugs, which > leaves only filesystem and storage firmware bugs. However the latter > are still reasonably common faults. Agreed, but in additional to filesystem and storage firmware bugs, virtualization systems can have bugs as well and if those bugs hit the kernel's cache (which is actually the more likely case- that's what the VM system is going to think it can monkey with, as long as it works with the kernel) then you can have cases which PG's checksum would likely catch since we check the checksum when we read from the kernel's read cache, and calculate the checksum before we push the page to the kernel's write cache. > I have seen multiple cases where, > after reviewing the corruption with a hex editor, the only reasonable > conclusion was a bug in the storage system. Data shifted around by > non-page size amounts, non-page aligned extents that are zeroed out, > etc. Right, I've seen similar kinds of things happening in memory of virtualized systems; things like random chunks of memory suddenly being zero'd. > Unfortunately none of those customers had checksums turned on at > the time. I feel that reacting to such errors with a non-cryptic and > easily debuggable checksum error is much better than erroring out with > huge memory allocations, crashing or returning bogus data. Timely > reaction to data corruption is really important for minimizing data > loss. Agreed. In addition to that, in larger environments where there are multiple databases involved for the explicit purpose of fail-over, a system which is going south because of bad memory or storage could be detected and pulled out, potentially with zero data loss. Of course, to minimize data loss, it'd be extremely important for the fail-over system to identify a checksum error more-or-less immediately and take the bad node out. Thanks! Stephen
signature.asc
Description: Digital signature