Re: new heapcheck contrib module

Peter Geoghegan Tue, 04 Aug 2020 09:01:19 -0700

On Tue, Aug 4, 2020 at 7:59 AM Robert Haas <robertmh...@gmail.com> wrote:
> I think we should try not to imagine anything in particular. Just to
> be clear, I am not trying to knock what you have; I know it was a lot
> of work to create and it's a huge improvement over having nothing. But
> in my mind, a perfect tool would do just what a human being would do
> if investigating manually: assume initially that you know nothing -
> the index might be totally fine, mildly corrupted in a very localized
> way, completely hosed, or anything in between. And it would
> systematically try to track that down by traversing the usable
> pointers that it has until it runs out of things to do. It does not
> seem impossible to build a tool that would allow us to take a big
> index and overwrite a random subset of pages with garbage data and
> have the tool tell us about all the bad pages that are still reachable
> from the root by any path. If you really wanted to go crazy with it,
> you could even try to find the bad pages that are not reachable from
> the root, by doing a pass after the fact over all the pages that you
> didn't otherwise reach. It would be a lot of work to build something
> like that and maybe not the best use of time, but if I got to wave
> tools into existence using my magic wand, I think that would be the
> gold standard.


I guess that might be true.

With indexes you tend to have redundancy in how relationships among
pages are described. So you have siblings whose pointers must be in
agreement (left points to right, right points to left), and it's not
clear which one you should trust when they don't agree. It's not like
simple heuristics get you all that far. I really can't think of a good
one, and detecting corruption should mean detecting truly exceptional
cases. I guess you could build a model based on Bayesian methods, or
something like that. But that is very complicated, and only used when
you actually have corruption -- which is presumably extremely rare in
reality. That's very unappealing as a project.

I have always believed that the big problem is not "known unknowns".
Rather, I think that the problem is "unknown unknowns". I accept that
you have a point, especially when it comes to heap checking, but even
there the most important consideration should be to make corruption
detection thorough and cheap. The vast vast majority of databases do
not have any corruption at any given time. You're not searching for a
needle in a haystack; you're searching for a needle in many many
haystacks within a field filled with haystacks, which taken together
probably contain no needles at all. (OTOH, once you find one needle
all bets are off, and you could very well go on to find a huge number
of them.)

-- 
Peter Geoghegan

Re: new heapcheck contrib module

Reply via email to