> On Oct 6, 2021, at 3:20 PM, Peter Geoghegan <p...@bowt.ie> wrote:
> 
>> I think the disagreements are about something else.
> 
> Informally speaking, you could say that pg_amcheck and amcheck verify
> relations. More formally speaking, both amcheck (whether called by
> pg_amcheck or some other thing) can only prove the presence of
> corruption. They cannot prove its absence. (The amcheck docs have
> always said almost these exact words.)

I totally agree that the amcheck functions cannot prove the absence of 
corruption.

I prefer not to even use language about proving the presence of corruption when 
discussing pg_amcheck.  I have let that slide upthread as a convenient 
short-hand, but I think it doesn't help.  For pg_amcheck to take any view 
whatsoever on whether a btree index is corrupt, it would have to introspect the 
error message that it gets back from bt_index_check().  It doesn't do that, nor 
do I think that it should.  It just prints the contents of the error for the 
user and records that fact and eventually exits with a non-zero exit code.  The 
error might have been something about the command exiting due to the crash of 
another backend, or to do with a deadlock against some other process, or 
whatever, and pg_amcheck has no opinion about whether any of that is to do with 
corruption or not.

> This seems to come up a lot because at various points you seem to be
> concerned about introducing specific imperfections. But it's not like
> your starting point was ever perfection, or ever could be.

From the point of view of detecting corruptions, I agree that it never could 
be.  But I'm not talking about that.  I'm talking about whether pg_amcheck 
issues all the commands that it is supposed to issue.  If I work for Daddy 
Warbucks and he gives me 30 classic cars to take to 10 different mechanics, I 
can do that job perfectly even if the mechanics do less than perfect work.  If 
I leave three cars in the driveway, that's on me.  Likewise, it's not on 
pg_amcheck if the checking functions can't do perfect work, but it is on 
pg_amcheck if it doesn't issue all the expected commands.  But later on in this 
email, it appears we don't have any remaining disagreements about that.  Read 
on....

> I can
> always describe a scenario in which amcheck misses real corruption --
> a scenario which may be very contrived. So the mere fact that some new
> theoretical possibility of corruption is introduced by some action
> does not in itself mean much. We're dealing with that constantly, and
> always will be.

I wish we could stop discussing this.  I really don't think this ticket has 
anything to do with how well or how poorly or how completely the amcheck 
functions work.

> Let's suppose I was to "directly fix amcheck + !indisvalid indexes". I
> don't even know what that means -- I honestly don't have a clue.
> You're focussing on one small piece of code in verify_nbtree.c, that
> seems to punt responsibility, but the fact is that there are deeply
> baked-in reasons why it does so. That's a reflection of how many
> things about the system work, in general. Attributing blame to any one
> small snippet of code (code in verify_nbtree.c, or wherever) just
> isn't helpful.

I think we have agreed that pg_amcheck can filter out invalid indexes.  I don't 
have a problem with that.  I admit that I did have a problem with that 
upthread, but its been a while since I conceded that point so I'd rather not 
have to argue it again.

>> In truth, all the pg_amcheck frontend client can take a view on is whether 
>> it was able to issue all the commands to the backend that it was asked to 
>> issue, and whether any of those commands responded with an error.
> 
> AFAICT that pg_amcheck has to do is follow the amcheck user docs, by
> generalizing from the example SQL query for the B-Tree stuff. And, it
> should separately filter non-temp relations for the heap stuff, for
> the same reasons (exactly the same situation there).

I think we have agreed on that one, too, without me having ever argued it.  I 
posted a patch to filter out the temporary tables already.

>> pg_amcheck -d db1 -d db2 -d db3 --table=mytable
>> 
>> In this case, mytable is a regular table on db1, a temporary table on db2, 
>> and an unlogged table on db3, and db3 is in recovery.
> 
> I don't think that pg_amcheck needs to care about being in recovery,
> at all. I agreed with you about using pg_is_in_recovery() from at one
> point. That was a mistake on my part.

Ok, excellent, that was probably the only thing that had me really hung up.  I 
thought you were still asking for pg_amcheck to filter out the --parent-check 
option when in recovery, but if you're not asking for that, then I might have 
enough to go on now.

>> I thought that we were headed toward a decision where (despite my 
>> discomfort) pg_amcheck would downgrade options as necessary, but now it 
>> sounds like that's not so.  So what should it do
> 
> Downgrade is how you refer to it. I just think of it as making sure
> that pg_amcheck only asks amcheck to verify relations that are
> basically capable of being verified (e.g., not a temp table).

I was using "downgrading" to mean downgrading from bt_index_parent_check() to 
bt_index_check() when pg_is_in_recovery() is true, but you've clarified that 
you're not requesting that downgrade, so I think we've now gotten past the last 
sticking point about that whole issue.

There are other sticking points that don't seem to be things you have taken a 
view on.  Specifically, pg_amcheck complains if a relation pattern doesn't 
match anything, so that

pg_amcheck --table="*acountng*"

will complain if no tables match, giving the user the opportunity to notice 
that they spelled "accounting" wrong.  If there happens to be a table named 
"xyzacountngo", and that matches, too bad.  There isn't any way pg_amcheck can 
be responsible for that.  But if there is a temporary table named 
"xyzacountngo" and that gets skipped because it's a temp table, I don't know 
what feedback the user should get.  That's a thorny user interfaces question, 
not a corruption checking question, and I don't think you need to weigh in 
unless you want to.  I'll most likely go with whatever is the simplest to code 
and/or most similar to what is currently in the tree, because I don't see any 
knock-down arguments one way or the other.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company





Reply via email to