On Tue, Jan 24, 2017 at 4:07 AM, Tom Lane <t...@sss.pgh.pa.us> wrote: > Peter Geoghegan <p...@heroku.com> writes: >> I thought that checksums went in in part because we thought that there >> was some chance that they'd find bugs in Postgres. > > Not really. AFAICS the only point is to catch storage-system malfeasance.
This matches my understanding. Actual physical media errors are caught by lower level checksums/error correction codes, and memory errors are caught by ECC RAM. Checksums do very little for PostgreSQL bugs, which leaves only filesystem and storage firmware bugs. However the latter are still reasonably common faults. I have seen multiple cases where, after reviewing the corruption with a hex editor, the only reasonable conclusion was a bug in the storage system. Data shifted around by non-page size amounts, non-page aligned extents that are zeroed out, etc. Unfortunately none of those customers had checksums turned on at the time. I feel that reacting to such errors with a non-cryptic and easily debuggable checksum error is much better than erroring out with huge memory allocations, crashing or returning bogus data. Timely reaction to data corruption is really important for minimizing data loss. Regards, Ants Aasma -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers