On Wed, Mar 27, 2019 at 10:29 AM Heikki Linnakangas <hlinn...@iki.fi> wrote: > Thanks! I had a look, and refactored it quite a bit.
I'm really happy that other people seem to be driving amcheck in a new direction, without any prompting from me. It's too important to remain something that only I have contributed to. > I found the way the recursion worked confusing. On each internal page, > it looped through all the child nodes, checking the consistency of the > downlinks. And then it looped through the children again, to recurse. > This isn't performance-critical, but visiting every page twice still > seems strange. To be fair, that's actually what bt_index_parent_check() does. OTOH, it has to work in a way that makes it an extension of bt_index_check(), which is not a problem for gist_index_parent_check(). > In gist_check_page_keys(), if we get into the code to deal with a > concurrent update, we set 'parent' to point to a tuple on a parent page, > then unlock it, and continue to look at remaining tuples, using the > pointer that points to an unlocked buffer. Why not just copy the page into a local buffer? See my remarks on this below. > I came up with the attached, which fixes the above-mentioned things. I > also replaced the check that each node has only internal or leaf > children, with a different check that the tree has the same height in > all branches. That catches more potential problems, and was easier to > implement after the refactoring. This still needs at least a round of > fixing typos and tidying up comments, but it's more straightforward now, > IMHO. You probably didn't mean to leave this "boom" error behind: > + if (GistPageIsDeleted(page)) > + { > + elog(ERROR,"boom"); I see that you have this check for deleted pages: > + if (PageGetMaxOffsetNumber(page) > InvalidOffsetNumber) > + ereport(ERROR, > + (errcode(ERRCODE_INDEX_CORRUPTED), > + errmsg("index \"%s\" has deleted page with tuples", > + RelationGetRelationName(rel)))); > + } Why not have a similar check for non-deleted pages, whose maxoffset must be <= MaxIndexTuplesPerPage? I see various errors like this one: > + if (MAXALIGN(ItemIdGetLength(iid)) != > MAXALIGN(IndexTupleSize(idxtuple))) > + ereport(ERROR, > + (errcode(ERRCODE_INDEX_CORRUPTED), > + errmsg("index \"%s\" has inconsistent tuple sizes", > + RelationGetRelationName(rel)))); Can't we say which TID is involved here, so we can find the offending corrupt tuple afterwards? Or at least the block number? And maybe even the LSN of the page? I think that that kind of stuff could be added in a number of places. I see this stuff that's related to concurrent processes: > + /* > + * It's possible that the page was split since we looked at the > parent, > + * so that we didn't missed the downlink of the right sibling when we > + * scanned the parent. If so, add the right sibling to the stack now. > + */ > + /* > + * There was a discrepancy between parent and child tuples. > + * We need to verify it is not a result of concurrent call > + * of gistplacetopage(). So, lock parent and try to find > downlink > + * for current page. It may be missing due to concurrent page > + * split, this is OK. > + */ Is this really needed? Isn't the ShareLock on the index sufficient? If so, why? > + stack->parenttup = gist_refind_parent(rel, stack->parentblk, > stack->blkno, strategy); If the gistplacetopage() stuff is truly necessary, then is it okay to call gist_refind_parent() with the original buffer lock still held like this? I still suspect that we should have something like palloc_btree_page() for GiST, so that we're always operating on a copy of the page in palloc()'d memory. Maybe it's worthwhile to do something clever with concurrently holding buffer locks, but if that's what we're doing here then I would also expect us to have something weaker than ShareLock as our relation-level heavyweight lock. And, there should be a prominent explanation of the theory behind it somewhere. > What have you been using to test this? pg_hexedit has full support for GiST. ;-) -- Peter Geoghegan