On Tue, Aug 4, 2020 at 9:44 AM Robert Haas <robertmh...@gmail.com> wrote:
> I think it might be possible to distinguish between different types of
> corruption and to separate, at least to some degree, the checking
> associated with each type. I think one can imagine something that
> checks the structure of a btree without regard to the contents. That
> is, it cares that left and right links are consistent with each other
> and with downlinks from the parent level. So it checks things like the
> left link of the page to which my right link points is pointing back
> to me, and that's also the page to which my parent's next downlink
> points.

I think that this kind of phased approach to B-Tree verification is
possible, more or less, but hard to justify. And it seems impossible
to do with only an AccessShareLock.

It's not clear that what you describe is much better than just
checking a bunch of indexes and seeing what patterns emerge. For
example, the involvement of collated text might be a common factor
across indexes. That kind of pattern is the first thing that I look
for, and often the only thing. It also serves to give me an idea of
how messed up things are. There are not that many meaningful degrees
of messed-up with indexes in my experience. The first error really
does tell you most of what you need to know about any given corrupt
index. Kind of like how you can bucket the number of cockroaches in
your home into perhaps three meaningful buckets: 0 cockroaches, at
least 1 cockroach, and lots of cockroaches. (Even there, if you really
care about the distinction between the second and third bucket,
something has gone terribly wrong -- so even three buckets seems like
a lot to me.)

FWIW, current DEBUG1 + DEBUG2 output for amcheck shows you quite a lot
of details about the tree structure. It's a handy way of getting a
sense of what's going on at a high level. For example, if index
corruption is found very early on, that strongly suggests that it's
pretty pervasive.

> Now a second type of checking, which can also be done without regard
> to keys, is checking that the TIDs in the index point to TIDs that are
> on heap pages that actually exist, and that the corresponding items
> are not unused, nor are they tuples which are not the root of a HOT
> chain. Passing a check of this type doesn't prove that the index and
> heap are consistent, but failing it proves that they are inconsistent.
> This kind of check can be done on every leaf index page you can find
> by any means even if it fails the structural checks described above.
> Failure of these checks on one page does not preclude checking the
> same invariants for other pages. Let's call this kind of thing "basic
> index-heap sanity checking."

One real weakness in the current code is our inability to detect index
tuples that are in the correct order and so on, but point to the wrong
thing -- we can detect that if it manifests itself as the absence of
an index tuple that should be in the index (when you use
heapallindexed verification), but we cannot *reliably* detect the
presence of an index tuple that shouldn't be in the index at all
(though in practice it probably mostly gets caught).

The checks on the tree structure itself are excellent with
bt_index_parent_check() following Alexander's commit d114cc53 (which I
thought was really excellent work). But we still have that one
remaining blind spot in verify_nbtree.c, even when you opt in to every
possible type of verification (i.e. bt_index_parent_check() with all
options). I'd much rather fix that, or help with the new heap checker
stuff.

> A fourth type of checking is to verify the index key against the keys
> in the heap tuples to which they point, but only for index tuples that
> passed the basic index-heap sanity checking and where the tuples have
> not been pruned. This can be sensibly done even if the structural
> checks or index-ordering checks have failed.

That's going to require the equivalent of a merge join, which is
terribly expensive relative to such a small benefit.

> Aside from providing a way to usefully continue after errors, this
> would also be useful in certain scenarios where you want to know what
> kind of corruption you have. For example, suppose that I start getting
> wrong answers from index lookups on a particular index. Upon
> investigation, it turns out that my last glibc update changed my OS
> collation definitions for the collation I'm using, and therefore it is
> to be expected that some of my keys may appear to be out of order with
> respect to the new definitions. Now what I really want to know before
> running REINDEX is that this is the only problem I have. It would be
> amazing if I could run the tool and have it give me a list of problems
> so that I could confirm that I have only index-ordering problems, not
> any other kind, and even more amazing if it could tell me the specific
> keys that were affected so that I could understand exactly how the
> sorting behavior changed.

This detail seems really hard. There are probably lots of cases where
the sorting behavior changed but it just didn't affect you, given the
data you had -- it just so happened that you didn't have exactly the
wrong kind of diacritic mark or whatever. After all, revisions to how
strings in a given natural language are supposed to sort are likely to
be relatively rare and relatively obscure (even among people that
speak the language in question). Also, the task of figuring out if the
tuple to the left or the right is in the wrong order seems kind of
daunting.

Meanwhile, a simple smoke test covering many indexes probably gives
you a fairly meaningful idea of the extent of the damage, without
requiring that we do any hard engineering work.

> I'm speaking here with fairly limited knowledge of the details of how
> all this actually works and, again, I'm not trying to suggest that you
> or anyone is obligated to do any work on this, or that it would be
> easy to accomplish or worth the time it took. I'm just trying to
> sketch out what I see as maybe being theoretically possible, and why I
> think it would be useful if it did.

I don't think that your relatively limited knowledge of the B-Tree
code is an issue here -- your intuitions seem pretty reasonable. I
appreciate your perspective here. Corruption detection presents us
with some odd qualitative questions of the kind that are just awkward
to discuss. Discouraging perspectives that don't quite match my own
would be quite counterproductive.

That having been said, I suspect that this is a huge task for a small
benefit. It's exceptionally hard to test because you have lots of
non-trivial code that only gets used in circumstances that by
definition should never happen. If users really needed to recover the
data in the index then maybe it would happen -- but they don't.

The biggest problem that amcheck currently has is that it isn't used
enough, because it isn't positioned as a general purpose tool at all.
I'm hoping that the work from Mark helps with that.

--
Peter Geoghegan


Reply via email to