On Tue, Aug 4, 2020 at 7:59 AM Robert Haas <robertmh...@gmail.com> wrote: > I think we should try not to imagine anything in particular. Just to > be clear, I am not trying to knock what you have; I know it was a lot > of work to create and it's a huge improvement over having nothing. But > in my mind, a perfect tool would do just what a human being would do > if investigating manually: assume initially that you know nothing - > the index might be totally fine, mildly corrupted in a very localized > way, completely hosed, or anything in between. And it would > systematically try to track that down by traversing the usable > pointers that it has until it runs out of things to do. It does not > seem impossible to build a tool that would allow us to take a big > index and overwrite a random subset of pages with garbage data and > have the tool tell us about all the bad pages that are still reachable > from the root by any path. If you really wanted to go crazy with it, > you could even try to find the bad pages that are not reachable from > the root, by doing a pass after the fact over all the pages that you > didn't otherwise reach. It would be a lot of work to build something > like that and maybe not the best use of time, but if I got to wave > tools into existence using my magic wand, I think that would be the > gold standard.
I guess that might be true. With indexes you tend to have redundancy in how relationships among pages are described. So you have siblings whose pointers must be in agreement (left points to right, right points to left), and it's not clear which one you should trust when they don't agree. It's not like simple heuristics get you all that far. I really can't think of a good one, and detecting corruption should mean detecting truly exceptional cases. I guess you could build a model based on Bayesian methods, or something like that. But that is very complicated, and only used when you actually have corruption -- which is presumably extremely rare in reality. That's very unappealing as a project. I have always believed that the big problem is not "known unknowns". Rather, I think that the problem is "unknown unknowns". I accept that you have a point, especially when it comes to heap checking, but even there the most important consideration should be to make corruption detection thorough and cheap. The vast vast majority of databases do not have any corruption at any given time. You're not searching for a needle in a haystack; you're searching for a needle in many many haystacks within a field filled with haystacks, which taken together probably contain no needles at all. (OTOH, once you find one needle all bets are off, and you could very well go on to find a huge number of them.) -- Peter Geoghegan