On Thu, Nov 19, 2020 at 9:06 AM Robert Haas <robertmh...@gmail.com> wrote: > I'm also not sure if these descriptions are clear enough, but it may > also be hard to do a good job in a brief space. Still, comparing this > to the documentation of heapallindexed makes me rather nervous. This > is only trying to verify that the index contains all the tuples in the > heap, not that the values in the heap and index tuples actually match.
That's a good point. As things stand, heapallindexed verification does not notice when there are extra index tuples in the index that are in some way inconsistent with the heap. Hopefully this isn't too much of a problem in practice because the presence of extra spurious tuples gets detected by the index structure verification process. But in general that might not happen. Ideally heapallindex verification would verify 1:1 correspondence. It doesn't do that right now, but it could. This could work by having two bloom filters -- one for the heap, another for the index. The implementation would look for the absence of index tuples that should be in the index initially, just like today. But at the end it would modify the index bloom filter by &= it with the complement of the heap bloom filter. If any bits are left set in the index bloom filter, we go back through the index once more and locate index tuples that have at least some matching bits in the index bloom filter (we cannot expect all of the bits from each of the hash functions used by the bloom filter to still be matches). >From here we can do some kind of lookup for maybe-not-matching index tuples that we locate. Make sure that they point to an LP_DEAD line item in the heap or something. Make sure that they have the same values as the heap tuple if they're still retrievable (i.e. if we haven't pruned the heap tuple away already). > This to me seems too conservative. The result is that by default we > check only tables, not indexes. I don't think that's going to be what > users want. I don't know whether they want the heapallindexed or > rootdescend behaviors for index checks, but I think they want their > indexes checked. Happy to hear opinions from actual users on what they > want; this is just me guessing that you've guessed wrong. :-) My thoughts on these two options: * I don't think that users will ever want rootdescend verification. That option exists now because I wanted to have something that relied on the uniqueness property of B-Tree indexes following the Postgres 12 work. I didn't add retail index tuple deletion, so it seemed like a good idea to have something that makes the same assumptions that it would have to make. To validate the design. Another factor is that Alexander Korotkov made the basic bt_index_parent_check() tests a lot better for Postgres 13. This undermined the practical argument for using rootdescend verification. Finally, note that bt_index_parent_check() was always supposed to be something that was to be used only when you already knew that you had big problems, and wanted absolutely thorough verification without regard for the costs. This isn't the common case at all. It would be reasonable to not expose anything from bt_index_parent_check() at all, or to give it much less prominence. Not really sure of what the right balance is here myself, so I'm not insisting on anything. Just telling you what I know about it. * heapallindexed is kind of expensive, but valuable. But the extra check is probably less likely to help on the second or subsequent index on a table. It might be worth considering an option that only uses it with only one index: Preferably the primary key index, failing that some unique index, and failing that some other index. > This seems pretty lame to me. Even if the btree checker can't tolerate > corruption to the extent that the heap checker does, seg faulting > because of a missing file seems like a bug that we should just fix > (and probably back-patch). I'm not very convinced by the decision to > override the user's decision about heapallindexed either. I strongly agree. > Maybe I lack > imagination, but that seems pretty arbitrary. Suppose there's a giant > index which is missing entries for 5 million heap tuples and also > there's 1 entry in the table which has an xmin that is less than the > pg_clas.relfrozenxid value by 1. You are proposing that because I have > the latter problem I don't want you to check for the former one. But > I, John Q. Smartuser, do not want you to second-guess what I told you > on the command line that I wanted. :-) Even if your user is just average, they still have one major advantage over the architects of pg_amcheck: actual knowledge of the problem in front of them. > I think in general you're worrying too much about the possibility of > this tool causing backend crashes. I think it's good that you wrote > the heapcheck code in a way that's hardened against that, and I think > we should try to harden other things as time permits. But I don't > think that the remote possibility of a crash due to the lack of such > hardening should dictate the design behavior of this tool. If the > crash possibilities are not remote, then I think the solution is to > fix them, rather than cutting out important checks. I couldn't agree more. I think that you need to have a kind of epistemic modesty with this stuff. Okay, we guarantee that the backend won't crash when certain amcheck functions are run, based on these caveats. But don't we always guarantee something like that? And are the specific caveats actually that different in each case, when you get right down to it? A guarantee does not exist in a vacuum. It always has implicit limitations. For example, any guarantee implicitly comes with the caveat "unless I, the guarantor, am wrong". Normally this doesn't really matter because normally we're not concerned about extreme events that will probably never happen even once. But amcheck is very much not like that. The chances of the guarantor being the weakest link are actually rather high. Everyone is better off with a design that accepts this view of things. I'm also suspicious of guarantees like this for less philosophical reasons. It seems to me like it solves our problem rather than the user's problem. Having data that is so badly corrupt that it's difficult to avoid segfaults when we perform some kind of standard transformations on it is an appalling state of affairs for the user. The segfault itself is very much not the point at all. We should focus on making the tool as thorough and low overhead as possible. If we have to make the tool significantly more complicated to avoid extremely unlikely segfaults then we're actually doing the user a disservice, because we're increasing the chances that we the guarantors will be the weakest link (which was already high enough). This smacks of hubris. I also agree that hardening is a worthwhile exercise here, of course. We should be holding amcheck to a higher standard when it comes to not segfaulting with corrupt data. -- Peter Geoghegan