On Mon, Oct 4, 2021 at 3:36 PM Mark Dilger <mark.dil...@enterprisedb.com> wrote:
> >> I believe this is a bug in amcheck's btree checking functions.  Peter, can 
> >> you take a look?
> >
> > Why do you say that?
>
> Because REINDEX CONCURRENTLY and the bt_index_parent_check() function seem to 
> have lock upgrade hazards that are unrelated to pg_amcheck.

The problem with that argument is that the bt_index_parent_check()
function isn't doing anything particularly special, apart from
dropping the lock. That has been its behavior for many years now.

> On to pg_amcheck's behavior....
>
> I see no evidence in the OP's complaint that pg_amcheck is misbehaving.  It 
> launches a worker to check each relation, prints for the user's benefit any 
> errors those checks raise, and finally returns 0 if they all pass and 2 
> otherwise.  Since not all relations could be checked, 2 is returned.  
> Returning 0 would be misleading, as it implies everything was checked and 
> passed, and it can't honestly say that.  The return value 2 does not mean 
> that anything failed.  It means that not all checks passed.  When a 2 is 
> returned, the user is expected to read the output and decide what, if 
> anything, they want to do about it.  In this case, the user might decide to 
> wait until the reindex finishes and check again, or they might decide they 
> don't care.
>
> It is true that pg_amcheck is calling bt_index_parent_check() on an invalid 
> index, but so what?  If it chose not to do so, it would still need to print a 
> message about the index being unavailable for checking, and it would still 
> have to return 2.

Why would it have to print such a message? You seem to be presenting
this as if there is some authoritative, precise, relevant definition
of "the relations that pg_amcheck sees". But to me the relevant
details look arbitrary at best.

> It can't return 0, and it is unhelpful to leave the user in the dark about 
> the fact that not all indexes are in the right state for checking.

Why is that unhelpful? More to the point, *why* would this alternative
behavior constitute "leaving the user in the dark"?

What about the case where the pg_class entry isn't visible to our MVCC
snapshot? Why is "skipping" such a relation not just as unhelpful?

> I think this bug report is really a feature request.  The OP appears to want 
> an option to toggle on/off the printing of such information, perhaps with not 
> printing it as the default.

And I don't understand why you think that clearly-accidental
implementation details (really just bugs) should be treated as
axiomatic truths about how pg_amcheck must work. Should we now "fix"
pg_dump so that it matches pg_amcheck?

All of the underlying errors are cases that were clearly intended to
catch user error -- every single one. But apparently pg_amcheck is
incapable of error, by definition. Like HAL 9000.

-- 
Peter Geoghegan


Reply via email to