Re: [HACKERS] Checksums by default?

Jim Nasby Sun, 12 Feb 2017 17:31:04 -0800

On 2/10/17 6:38 PM, Tomas Vondra wrote:

And no, backups may not be a suitable solution - the failure happens on
a standby, and the page (luckily) is not corrupted on the master. Which
means that perhaps the standby got corrupted by a WAL, which would
affect the backups too. I can't verify this, though, because the WAL got
removed from the archive, already. But it's a possibility.

Possibly related... I've got a customer that periodically has SR repliasstop in their tracks due to WAL checksum failure. I don't think there'sany hardware correlation (they've seen this on multiple machines).Studying the code, it occurred to me that if there's any bugs in thehandling of individual WAL record sizes or pointers during SR then youcould get CRC failures. So far every one of these occurrences has beenrepairable by replacing the broken WAL file on the replica. I'verequested that next time this happens they save the bad WAL.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)


--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Checksums by default?

Reply via email to