Re: Online verification of checksums

Andres Freund Mon, 04 Feb 2019 23:02:26 -0800

Hi,

On 2019-02-05 06:57:06 +0100, Fabien COELHO wrote:
> > > > I'm wondering (possibly again) about the existing early exit if one 
> > > > block
> > > > cannot be read on retry: the command should count this as a kind of bad
> > > > block, proceed on checking other files, and obviously fail in the end, 
> > > > but
> > > > having checked everything else and generated a report. I do not think 
> > > > that
> > > > this condition warrants a full stop. ISTM that under rare race 
> > > > conditions
> > > > (eg, an unlucky concurrent "drop database" or "drop table") this could
> > > > happen when online, although I could not trigger one despite heavy 
> > > > testing,
> > > > so I'm possibly mistaken.
> > > 
> > > This seems like a defensible judgement call either way.
> > 
> > Right now we have a few tests that explicitly check that
> > pg_verify_checksums fail on broken data ("foo" in the file).  Those
> > would then just get skipped AFAICT, which I think is the worse behaviour
> > , but if everybody thinks that should be the way to go, we can
> > drop/adjust those tests and make pg_verify_checksums skip them.
> > 
> > Thoughts?
> 
> My point is that it should fail as it does, only not immediately (early
> exit), but after having checked everything else. This mean avoiding calling
> "exit(1)" here and there (lseek, fopen...), but taking note that something
> bad happened, and call exit only in the end.


I can see both as being valuable (one gives you a more complete picture,
the other a quicker answer in scripts). For me that's the point where
it's the prerogative of the author to make that choice.

Greetings,

Andres Freund

Re: Online verification of checksums

Reply via email to