Should walsernder check correctness of WAL records?

Konstantin Knizhnik Thu, 01 Oct 2020 08:39:17 -0700

Hi hackers,

Investigating one of customer's support cases I found out that walsenderis not calculating WAL records CRC and send them to replicas without anychecks.

As a result damaged WAL record causes errors on all replicas:

LOG: incorrect resource manager data checksum in record at5FB9/D199F7D8

        FATAL: terminating walreceiver process due to administrator command

I wonder if it will be better to detect this problem earlier at master?
We can try to recover damaged WAL record (it is not always possible, but...)

Or at least do not advance replication slots and make it possible forDBA to restore corrupted WAL segment from archive and resume replication.

And right now the only choice is to restore replicas using basebackupwhich may take significant amount of time (for larger database).

And during this time master will not be protected from failures.

Or extra overhead of computing CRC in WAL sender is assumed to be to high?

Sorry, if this question was already discussed - I failed to find it inthe archive.


--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Should walsernder check correctness of WAL records?

Reply via email to