On Tue, Jan 1, 2019 at 2:11 AM Andrey Borodin <x4...@yandex-team.ru> wrote: > > This isn't consistent with what you do within verify_nbtree.c, which > > deliberately avoids ever holding more than a single buffer lock at a > > time, on general principle. That isn't necessarily a reason why you > > have to do the same, but it's not clear why you do things that way. > > Why isn't it enough to have a ShareLock on the relation? Maybe this is > > a sign that it would be a good idea to always operate on a palloc()'d > > copy of the page, by introducing something equivalent to > > palloc_btree_page(). (That would also be an opportunity to do very > > basic checks on every page.) > If we unlock parent page before checking child, some insert can adjust tuple > on parent, sneak into child and insert new tuple. > This can trigger false positive. I'll think about it more.
I think that holding a buffer lock on an internal pages for as long as it takes to check all of the child pages is a non-starter. If you can't think of a way of not doing that that's race free with a relation-level AccessShareLock, then a relation-level ShareLock (which will block VACUUM) seems necessary. I think that you should duplicate some of what's in bt_index_check_internal() -- lock the heap table as well as the index, to avoid deadlocks. We might not do this with contrib/pageinspect, but that's not really intended to be used in production in the same way this will be. > > * Maybe gist_index_check() should be called gist_index_parent_check(), > > since it's rather like the existing verification function > > bt_index_parent_check(). > > > > * Alternatively, you could find a way to make your function only need > > an AccessShareLock -- that would make gist_index_check() an > > appropriate name. That would probably require careful thought about > > VACUUM. > > I've changed lock level to AccessShareLock. IMV scan is just as safe as > regular GiST index scan. > There is my patch with VACUUM on CF, it can add deleted pages. I'll update > one of these two patches accordingly, if other is committed. Maybe you should have renamed it to gist_index_parent_check() instead, and kept the ShareLock. I don't think that this business with buffer locks is acceptable. And it is mostly checking parent/child relationships. Right? You're not really able to check GiST tuples against anything other than their parent, unlike with B-Tree (you can only do very simple things, like the IndexTupleSize()/lp_len cross-check). Naming the function gist_index_parent_check() seems like the right way to go, even if you could get away with an AccessShareLock (which I now tend to doubt). It was way too optimistic to suppose that there might be a clever way of locking down race conditions that allowed you to not couple buffer locks, but also use an AccessShareLock. After all, even the B-Tree amcheck code doesn't manage to do this when verifying parent/child relationships (it only does something clever with the sibling page cross-check). > > * Why is it okay to do this?: > > > >> + if (GistTupleIsInvalid(idxtuple)) > >> + ereport(LOG, > >> + (errmsg("index \"%s\" contains an inner tuple marked > >> as invalid", > >> + RelationGetRelationName(rel)), > >> + errdetail("This is caused by an incomplete page split > >> at crash recovery before upgrading to PostgreSQL 9.1."), > >> + errhint("Please REINDEX it."))); > > > > You should probably mention the gistdoinsert() precedent for this. > This invalid tuple will break inserts, but will not affect select. I do not > know, should this be error or warning in amcheck? It should be an error. Breaking inserts is a serious problem. -- Peter Geoghegan