Re: amcheck verification for GiST

Peter Geoghegan Wed, 27 Mar 2019 16:57:53 -0700

On Wed, Mar 27, 2019 at 10:29 AM Heikki Linnakangas <hlinn...@iki.fi> wrote:
> Thanks! I had a look, and refactored it quite a bit.


I'm really happy that other people seem to be driving amcheck in a new
direction, without any prompting from me. It's too important to remain
something that only I have contributed to.

> I found the way the recursion worked confusing. On each internal page,
> it looped through all the child nodes, checking the consistency of the
> downlinks. And then it looped through the children again, to recurse.
> This isn't performance-critical, but visiting every page twice still
> seems strange.

To be fair, that's actually what bt_index_parent_check() does. OTOH,
it has to work in a way that makes it an extension of
bt_index_check(), which is not a problem for
gist_index_parent_check().

> In gist_check_page_keys(), if we get into the code to deal with a
> concurrent update, we set 'parent' to point to a tuple on a parent page,
> then unlock it, and continue to look at remaining tuples, using the
> pointer that points to an unlocked buffer.

Why not just copy the page into a local buffer? See my remarks on this below.

> I came up with the attached, which fixes the above-mentioned things. I
> also replaced the check that each node has only internal or leaf
> children, with a different check that the tree has the same height in
> all branches. That catches more potential problems, and was easier to
> implement after the refactoring. This still needs at least a round of
> fixing typos and tidying up comments, but it's more straightforward now,
> IMHO.

You probably didn't mean to leave this "boom" error behind:

> +   if (GistPageIsDeleted(page))
> +   {
> +       elog(ERROR,"boom");

I see that you have this check for deleted pages:

> +       if (PageGetMaxOffsetNumber(page) > InvalidOffsetNumber)
> +           ereport(ERROR,
> +                   (errcode(ERRCODE_INDEX_CORRUPTED),
> +                   errmsg("index \"%s\" has deleted page with tuples",
> +                           RelationGetRelationName(rel))));
> +   }

Why not have a similar check for non-deleted pages, whose maxoffset
must be <= MaxIndexTuplesPerPage?

I see various errors like this one:

> +           if (MAXALIGN(ItemIdGetLength(iid)) != 
> MAXALIGN(IndexTupleSize(idxtuple)))
> +               ereport(ERROR,
> +                       (errcode(ERRCODE_INDEX_CORRUPTED),
> +                        errmsg("index \"%s\" has inconsistent tuple sizes",
> +                               RelationGetRelationName(rel))));

Can't we say which TID is involved here, so we can find the offending
corrupt tuple afterwards? Or at least the block number? And maybe even
the LSN of the page? I think that that kind of stuff could be added in
a number of places.

I see this stuff that's related to concurrent processes:

> +       /*
> +        * It's possible that the page was split since we looked at the 
> parent,
> +        * so that we didn't missed the downlink of the right sibling when we
> +        * scanned the parent. If so, add the right sibling to the stack now.
> +        */

> +               /*
> +                * There was a  discrepancy between parent and child tuples.
> +                * We need to verify it is not a result of concurrent call
> +                * of gistplacetopage(). So, lock parent and try to find 
> downlink
> +                * for current page. It may be missing due to concurrent page
> +                * split, this is OK.
> +                */

Is this really needed? Isn't the ShareLock on the index sufficient? If so, why?

> +               stack->parenttup = gist_refind_parent(rel, stack->parentblk, 
> stack->blkno, strategy);

If the gistplacetopage() stuff is truly necessary, then is it okay to
call gist_refind_parent() with the original buffer lock still held
like this?

I still suspect that we should have something like palloc_btree_page()
for GiST, so that we're always operating on a copy of the page in
palloc()'d memory.  Maybe it's worthwhile to do something clever with
concurrently holding buffer locks, but if that's what we're doing here
then I would also expect us to have something weaker than ShareLock as
our relation-level heavyweight lock. And, there should be a prominent
explanation of the theory behind it somewhere.

> What have you been using to test this?

pg_hexedit has full support for GiST.     ;-)

--
Peter Geoghegan

Re: amcheck verification for GiST

Reply via email to