Hi, On 2019-03-19 16:52:08 +0100, Michael Banck wrote: > Am Dienstag, den 19.03.2019, 11:22 -0400 schrieb Robert Haas: > > It's torn pages that I am concerned about - the server is writing and > > we are reading, and we get a mix of old and new content. We have been > > quite diligent about protecting ourselves from such risks elsewhere, > > and checksum verification should not be held to any lesser standard. > > If we see a checksum failure on an otherwise correctly read block in > online mode, we retry the block on the theory that we might have read a > torn page. If the checksum verification still fails, we compare its LSN > to the LSN of the current checkpoint and don't mind if its newer. This > way, a torn page should not cause a false positive either way I > think?.
False positives, no. But there's plenty potential for false negatives. In plenty clusters a large fraction of the pages is going to be touched in most checkpoints. > If it is a genuine storage failure we will see it in the next > pg_checksums run as its LSN will be older than the checkpoint. Well, but also, by that time it might be too late to recover things. Or it might be a backup that you just made, that you later want to recover from, ... > The basebackup checksum verification works in the same way. Shouldn't have been merged that way. Greetings, Andres Freund