On Thu, Feb 22, 2018 at 10:26 AM, Masahiko Sawada <sawada.m...@gmail.com> wrote: > On Thu, Feb 22, 2018 at 8:28 AM, Peter Geoghegan <p...@bowt.ie> wrote: >> On Wed, Feb 21, 2018 at 3:02 PM, R, Siva <sivas...@amazon.com> wrote: >>> Did you mean pin on the metapage buffer during ginInsertCleanup and not lock >>> during addition of tuples to the accumulator? The exclusive lock on metapage >>> buffer is released after reading/locking head of pending list and before we >>> process pages/add tuples to the accumulator in ginInsertCleanup [1]. >> >> AFAICT, nobody ever holds just a pin on the metapage as some kind of >> interlock (since nobody else ever acquires a "super exclusive lock" on >> the metapage -- if anybody else ever did that, then simply holding a >> pin might make sense as a way of blocking the "super exclusive" lock >> acquisition). Maybe you're thinking of the root page of posting trees? >> >> I think that Sawada-san simply means that holding an ExclusiveLock on >> the metapage makes writers block each other, and concurrent VACUUMs. >> At least, for as long as they're in ginInsertCleanup(). > > Yes, but I realized my previous mail was wrong, sorry. Insertion to > pending list doesn't acquire ExclusiveLock on metapage. So we can > insert tuples to pending list while cleaning up. >
Sorry for the very late response. FWIW, I've looked at this again. I think that the situation Siva reported in the first mail can happen before we get commit 3b2787e. That is, gin indexes had had a data corruption bug. I've reproduced the situation with PostgreSQL 10.1 and observed that a gin index can corrupt. However, gingetbitmap (fortunately?) returned a correct result even when the gin index is corrupted. The minimum situation I reproduced is that each gin entry has two pointers to the same TID as follows. gin-entry 1 gin-entry2 (1, 147) (1, 147) (1, 147) (1, 147) The above situation is surely corrupted where I executed the all steps Siva described in the first mail. The first TID of both entries points to an already-vacuumed itempointer (the tuple is inserted, deleted and vacuumed), whereas the second entries points to a live itempointer on heap. In entryGetItem, since we check advancePast it doesn't return the second TIDs in both posting list case and posting tree case. Also even in partial match case, since TIDbitmap eliminates the duplication entryGetItem can return a correct result. The corrupted gin index returned a correct result actually but no assertion failure happened. I'm not sure how you figured this duplicated item pointers issue out but what I got through investigating this issue is that gin indexes could return a correct result without no assertion failure even if it somewhat corrupted. So maybe having amcheck for gin indexes would resolve part of problems. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center