Hi, The patch is mostly copying the verification / retry logic from basebackup.c, but I think it omitted a rather important detail that makes it incorrect in the presence of concurrent writes.
The very first thing basebackup does is this: startptr = do_pg_start_backup(...); i.e. it waits for a checkpoint, remembering the LSN. And then when checking a page it does this: if (!PageIsNew(page) && PageGetLSN(page) < startptr) { ... verify the page checksum } Obviously, pg_verify_checksums can't do that easily because it's supposed to run from outside the database instance. But the startptr detail is pretty important because it supports this retry reasoning: /* * Retry the block on the first failure. It's * possible that we read the first 4K page of the * block just before postgres updated the entire block * so it ends up looking torn to us. We only need to * retry once because the LSN should be updated to * something we can ignore on the next pass. If the * error happens again then it is a true validation * failure. */ Imagine the 8kB page as two 4kB pages, with the initial state being [A1,A2] and another process over-writing it with [B1,B2]. If you read the 8kB page, what states can you see? I don't think POSIX provides any guarantees about atomicity of the write calls (and even if it does, the filesystems on Linux don't seem to). So you may observe both [A1,B2] or [A2,B1], or various inconsistent mixes of the two versions, depending on timing. Well, torn pages ... Pretty much the only thing you can rely on is that when one process does write([B1,B2]) the other process may first read [A1,B2], but the next read will return [B1,B2] (or possibly newer data, if there was another write). It will not read the "stale" A1 again. The basebackup relies on this kinda implicitly - on the retry it'll notice the LSN changed (thanks to the startptr check), and the page will be skipped entirely. This is pretty important, because the new page might be torn in some other way. The pg_verify_checksum apparently ignores this skip logic, because on the retry it simply re-reads the page again, verifies the checksum and reports an error. Which is broken, because the newly read page might be torn again due to a concurrent write. So IMHO this should do something similar to basebackup - check the page LSN, and if it changed then skip the page. I'm afraid this requires using the last checkpoint LSN, the way startptr is used in basebackup. In particular we can't simply remember LSN from the first read, because we might actually read [B1,A2] on the first try, and then [B1,B2] or [B1,C2] on the retry. (Actually, the page may be torn in various other ways, not necessarily at the 4kB boundary - it might be torn right after the LSN, for example). FWIW I also don't understand the purpose of pg_sleep(), it does not seem to protect against anything, really. -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services