> On Oct 4, 2021, at 4:45 PM, Peter Geoghegan <p...@bowt.ie> wrote:
>
> I'm guessing that you meant REINDEX CONCURRENTLY.
Yes.
> Since you're talking about the case where it has an error
Sorry, I realized after hitting <send> that you might take it that way, but I
mean the logs generally, not just postgres logs. If somebody runs "reindex
concurrently" on all tables at midnight every night, and they see one morning
in the (non-postgres) logs from the midnight hour weird error messages about
their RAID controller, they may well want to check all their indexes to see if
any of them were corrupted. This is a totally made-up example, but the idea
that a user might want to check their indexes, tables, or both owing to errors
of some kind is not far-fetched.
> , the whole
> index build must have failed. So the user would get exactly what
> they'd expect -- verification of the original index, without any
> hindrance from the new/failed index.
Right, in that case, but not if hardware errors corrupted the index, and
generated logs, without happening to trip up the reindex concurrently statement
itself.
> (Thinks for a moment...)
>
> Actually, I think that we'd only verify the original index, even
> before the error with CONCURRENTLY (though I've not checked that point
> myself).
To get back on track, let me say that I'm not taking the position that what
pg_amcheck currently does is necessarily correct, but just that I'd like to be
careful about what we change, if anything. There are three things that seem to
irritate people:
1) A non-zero exit code means "not everything was checked and passed" rather
than "at least one thing is definitely corrupt".
2) Skipping of indexes is reported to the user with the word 'ERROR' in the
report rather than, say, 'NOTICE'.
3) Deadlocks can occur
I have resisted changing #1 on the theory that `pg_amcheck --all &&
./post_all_checks_pass.sh` should only run the post_all_checks_pass.sh if
indeed all checks have passed, and I'm interpreting skipping an index check as
being contrary to that. But maybe that's wrong of me. I don't know. There is
already sloppiness between the time that pg_amcheck resolves which database
relations are matched by --all, --table, --index, etc. and the time that all
the checks are started, and again between that time and when the last one is
complete. Database objects could be created or dropped during those spans of
time, in which case --all doesn't have quite so well defined a meaning. But
the user running pg_amcheck might also *know* that they aren't running any such
DDL, and therefore expect --all to actually result in everything being checked.
I find it strange that I should do anything about #2 in pg_amcheck, since it's
the function in verify_nbtree that phrases the situation as an error. But I
suppose I can just ignore that and have it print as a notice. I'm genuinely
not trying to give you grief here -- I simply don't like that pg_amcheck is
adding commentary atop what the checking functions are doing. I see a clean
division between what pg_amcheck is doing and what amcheck is doing, and this
feels to me to put that on the wrong side of the divide. If refusing to check
the index because it is not in the requisite state is a notice, then why
wouldn't verify_nbtree raise it as one and return early rather than raising an
error?
I also find it strange that #3 is being attributed to pg_amcheck's choice of
how to call the checking function, because I can't think of any other function
where we require the SQL caller to do anything like what you are requiring here
in order to prevent deadlocks, and also because the docs for the functions
don't say that a deadlock is possible, merely that the function may return with
an error. I was totally content to get an error back, since errors are how the
verify_nbtree functions communicate everything else, and the handler for those
functions is already prepared to deal with the error messages so returned. But
it clearly annoys you that pg_amcheck is doing this, so I'll go forward with
the patch that I already have written which does otherwise.
—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company