Hallo Michael,
I'm wondering (possibly again) about the existing early exit if one block
cannot be read on retry: the command should count this as a kind of bad
block, proceed on checking other files, and obviously fail in the end, but
having checked everything else and generated a report. I do not think that
this condition warrants a full stop. ISTM that under rare race conditions
(eg, an unlucky concurrent "drop database" or "drop table") this could
happen when online, although I could not trigger one despite heavy testing,
so I'm possibly mistaken.
This seems like a defensible judgement call either way.
Right now we have a few tests that explicitly check that
pg_verify_checksums fail on broken data ("foo" in the file). Those
would then just get skipped AFAICT, which I think is the worse behaviour
, but if everybody thinks that should be the way to go, we can
drop/adjust those tests and make pg_verify_checksums skip them.
Thoughts?
My point is that it should fail as it does, only not immediately (early
exit), but after having checked everything else. This mean avoiding
calling "exit(1)" here and there (lseek, fopen...), but taking note that
something bad happened, and call exit only in the end.
--
Fabien.