On 28/03/18 19:53, Teodor Sigaev wrote:
Hi!
I slightly modified test for clean demonstration of difference between
fastupdate on and off. Also I added CheckForSerializableConflictIn() to
fastupdate off codepath, but only in case of non-empty pending list.
The next question what I see: why do not we lock entry leaf pages in some cases?
Why should we?
As I understand, scan should lock any visited page, but now it's true only for
posting tree. Seems, it also should lock pages in entry tree because concurrent
procesess could add new entries which could be matched by partial search, for
example. BTW, partial search (see collectMatchBitmap()) locks correctly entry
tree, but regular startScanEntry() doesn't lock entry page in case of posting
tree, only in case of posting list.
I think this needs some high-level comments or README to explain how the
locking works. It seems pretty ad hoc at the moment. And incorrect.
1. Why do we lock all posting tree pages, even though they all represent
the same value? Isn't it enough to lock the root of the posting tree?
2. Why do we lock any posting tree pages at all, if we lock the entry
tree page anyway? Isn't the lock on the entry tree page sufficient to
cover the key value?
3. Why do we *not* lock the entry leaf page, if there is no match? We
still need a lock to remember that we probed for that value and there
was no match, so that we conflict with a tuple that might be inserted later.
At least #3 is a bug. The attached patch adds an isolation test that
demonstrates it. #1 and #2 are weird, and cause unnecessary locking, so
I think we should fix those too, even if they don't lead to incorrect
results.
Remember, the purpose of predicate locks is to lock key ranges, not
physical pages or tuples in the index. We use leaf pages as handy
shortcut for "any key value that would belong on this page", but it is
just an implementation detail.
I took a stab at fixing those issues, as well as the bug when fastupdate
is turned on concurrently. Does the attached patch look sane to you?
- Heikki
>From b1ccb28fa8249d644382fd0b9c2a6ab94f6395e7 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakan...@iki.fi>
Date: Mon, 9 Apr 2018 13:31:42 +0300
Subject: [PATCH 1/1] Re-think predicate locking on GIN indexes.
The principle behind the locking was not very well thought-out, and not
documented. Add a section in the README to explain how it's supposed to
work, and change the code so that it actually works that way.
This fixes two bugs:
1. If fast update was turned on concurrently, subsequent inserts to the
pending list would not conflict with predicate locks that were acquired
earlier, on entry pages. The included 'predicate-gin-fastupdate' test
demonstrates that. To fix, make all scans acquire a predicate lock on
the metapage. That lock represents a scan of the pending list, whether
or not there is a pending list at the moment. Forget about the
optimization to skip locking/checking for locks, when fastupdate=off.
Maybe some of that was safe, but I couldn't convince myself of it, so
better to rip it out and keep things simple.
2. If a scan finds no match, it still needs to lock the entry page. The
point of predicate locks is to lock the gabs between values, whether
or not there is a match. The included 'predicate-gin-nomatch' test
tests that case.
In addition to those two bug fixes, this removes some unnecessary locking,
following the principle laid out in the README. Because all items in
a posting tree have the same key value, a lock on the posting tree root is
enough to cover all the items. (With a very large posting tree, it would
possibly be better to lock the posting tree leaf pages instead, so that a
"skip scan" with a query like "A & B", you could avoid unnecessary conflict
if a new tuple is inserted with A but !B. But let's keep this simple.)
Also, some spelling and whitespace fixes.
---
src/backend/access/gin/README | 34 ++++++
src/backend/access/gin/ginbtree.c | 13 ++-
src/backend/access/gin/gindatapage.c | 25 +++--
src/backend/access/gin/ginfast.c | 8 ++
src/backend/access/gin/ginget.c | 116 ++++++++++-----------
src/backend/access/gin/gininsert.c | 34 ++----
src/backend/access/gin/ginutil.c | 7 --
src/backend/access/gin/ginvacuum.c | 1 -
src/backend/access/gist/gist.c | 2 +-
src/backend/storage/lmgr/README-SSI | 22 ++--
src/include/access/gin_private.h | 7 +-
.../expected/predicate-gin-fastupdate.out | 30 ++++++
.../isolation/expected/predicate-gin-nomatch.out | 15 +++
src/test/isolation/expected/predicate-gin.out | 4 +-
src/test/isolation/isolation_schedule | 2 +
.../isolation/specs/predicate-gin-fastupdate.spec | 49 +++++++++
.../isolation/specs/predicate-gin-nomatch.spec | 35 +++++++
src/test/isolation/specs/predicate-gin.spec | 4 +-
18 files changed, 281 insertions(+), 127 deletions(-)
create mode 100644 src/test/isolation/expected/predicate-gin-fastupdate.out
create mode 100644 src/test/isolation/expected/predicate-gin-nomatch.out
create mode 100644 src/test/isolation/specs/predicate-gin-fastupdate.spec
create mode 100644 src/test/isolation/specs/predicate-gin-nomatch.spec
diff --git a/src/backend/access/gin/README b/src/backend/access/gin/README
index 990b5ffa58..cc434b1feb 100644
--- a/src/backend/access/gin/README
+++ b/src/backend/access/gin/README
@@ -331,6 +331,40 @@ page-deletions safe; it stamps the deleted pages with an XID and keeps the
deleted pages around with the right-link intact until all concurrent scans
have finished.)
+Predicate Locking
+-----------------
+
+GIN supports predicate locking, for serializable snapshot isolation.
+A predicate locks represent that a scan has scanned a range of values. They
+are not concerned with physical pages as such, but the logical key values.
+A predicate lock on a page covers the key range that would belong on that
+page, whether or not there are any matching tuples there currently. In other
+words, a predicate lock on an index page covers the "gaps" between the index
+tuples. To minimize false positives, predicate locks are acquired at the
+finest level possible.
+
+* Like in the B-tree index, it is enough to lock only leaf pages, because all
+ insertions happen at the leaf level.
+
+* In an equality search (i.e. not a partial match search), if a key entry has
+ a posting tree, we lock the posting tree root page, to represent a lock on
+ just that key entry. Otherwise, we lock the entry tree page. We also lock
+ the entry tree page if no match is found, to lock the "gap" where the entry
+ would've been, had there been one.
+
+* In a partial match search, we lock all the entry leaf pages that we scan,
+ in addition to locks on posting tree roots, to represent the "gaps" between
+ values.
+
+* In addition to the locks on entry leaf pages and posting tree roots, all
+ scans grab a lock the metapage. This is to interlock with insertions to
+ the fast update pending list. An insertion to the pending list can really
+ belong anywhere in the tree, and the lock on the metapage represents that.
+
+The interlock for fastupdate pending lists means that with fastupdate=on,
+we effectively always grab a full-index lock, so you could get a lot of false
+positives.
+
Compatibility
-------------
diff --git a/src/backend/access/gin/ginbtree.c b/src/backend/access/gin/ginbtree.c
index 095b1192cb..5bd0c7a560 100644
--- a/src/backend/access/gin/ginbtree.c
+++ b/src/backend/access/gin/ginbtree.c
@@ -80,10 +80,21 @@ ginFindLeafPage(GinBtree btree, bool searchMode, Snapshot snapshot)
stack = (GinBtreeStack *) palloc(sizeof(GinBtreeStack));
stack->blkno = btree->rootBlkno;
- stack->buffer = ReadBuffer(btree->index, btree->rootBlkno);
stack->parent = NULL;
stack->predictNumber = 1;
+ /*
+ * Start from the root page. If the caller had already pinned it, take
+ * advantage of that.
+ */
+ if (BufferIsValid(btree->rootBuffer))
+ {
+ IncrBufferRefCount(btree->rootBuffer);
+ stack->buffer = btree->rootBuffer;
+ }
+ else
+ stack->buffer = ReadBuffer(btree->index, btree->rootBlkno);
+
for (;;)
{
Page page;
diff --git a/src/backend/access/gin/gindatapage.c b/src/backend/access/gin/gindatapage.c
index 642ca1a2c7..837da0720f 100644
--- a/src/backend/access/gin/gindatapage.c
+++ b/src/backend/access/gin/gindatapage.c
@@ -102,6 +102,8 @@ typedef struct
int nitems; /* # of items in 'items', if items != NULL */
} leafSegmentInfo;
+static void ginPrepareDataScan(GinBtree btree, Relation index, BlockNumber rootBlkno,
+ Buffer rootBuffer);
static ItemPointer dataLeafPageGetUncompressed(Page page, int *nitems);
static void dataSplitPageInternal(GinBtree btree, Buffer origbuf,
GinBtreeStack *stack,
@@ -1812,8 +1814,8 @@ createPostingTree(Relation index, ItemPointerData *items, uint32 nitems,
blkno = BufferGetBlockNumber(buffer);
/*
- * Copy a predicate lock from entry tree leaf (containing posting list)
- * to posting tree.
+ * Copy any predicate locks from the entry tree leaf (containing posting
+ * list) to the posting tree.
*/
PredicateLockPageSplit(index, BufferGetBlockNumber(entrybuffer), blkno);
@@ -1840,7 +1842,7 @@ createPostingTree(Relation index, ItemPointerData *items, uint32 nitems,
PageSetLSN(page, recptr);
}
- UnlockReleaseBuffer(buffer);
+ LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
END_CRIT_SECTION();
@@ -1855,22 +1857,26 @@ createPostingTree(Relation index, ItemPointerData *items, uint32 nitems,
*/
if (nitems > nrootitems)
{
- ginInsertItemPointers(index, blkno,
+ ginInsertItemPointers(index, blkno, buffer,
items + nrootitems,
nitems - nrootitems,
buildStats);
}
+ ReleaseBuffer(buffer);
+
return blkno;
}
-void
-ginPrepareDataScan(GinBtree btree, Relation index, BlockNumber rootBlkno)
+static void
+ginPrepareDataScan(GinBtree btree, Relation index, BlockNumber rootBlkno,
+ Buffer rootBuffer)
{
memset(btree, 0, sizeof(GinBtreeData));
btree->index = index;
btree->rootBlkno = rootBlkno;
+ btree->rootBuffer = rootBuffer;
btree->findChildPage = dataLocateItem;
btree->getLeftMostChild = dataGetLeftMostPage;
@@ -1891,7 +1897,7 @@ ginPrepareDataScan(GinBtree btree, Relation index, BlockNumber rootBlkno)
* Inserts array of item pointers, may execute several tree scan (very rare)
*/
void
-ginInsertItemPointers(Relation index, BlockNumber rootBlkno,
+ginInsertItemPointers(Relation index, BlockNumber rootBlkno, Buffer rootBuffer,
ItemPointerData *items, uint32 nitem,
GinStatsData *buildStats)
{
@@ -1899,7 +1905,7 @@ ginInsertItemPointers(Relation index, BlockNumber rootBlkno,
GinBtreeDataLeafInsertData insertdata;
GinBtreeStack *stack;
- ginPrepareDataScan(&btree, index, rootBlkno);
+ ginPrepareDataScan(&btree, index, rootBlkno, rootBuffer);
btree.isBuild = (buildStats != NULL);
insertdata.items = items;
insertdata.nitem = nitem;
@@ -1911,7 +1917,6 @@ ginInsertItemPointers(Relation index, BlockNumber rootBlkno,
btree.itemptr = insertdata.items[insertdata.curitem];
stack = ginFindLeafPage(&btree, false, NULL);
- GinCheckForSerializableConflictIn(btree.index, NULL, stack->buffer);
ginInsertValue(&btree, stack, &insertdata, buildStats);
}
}
@@ -1925,7 +1930,7 @@ ginScanBeginPostingTree(GinBtree btree, Relation index, BlockNumber rootBlkno,
{
GinBtreeStack *stack;
- ginPrepareDataScan(btree, index, rootBlkno);
+ ginPrepareDataScan(btree, index, rootBlkno, InvalidBuffer);
btree->fullScan = true;
diff --git a/src/backend/access/gin/ginfast.c b/src/backend/access/gin/ginfast.c
index 615730b8e5..5f624cf6fa 100644
--- a/src/backend/access/gin/ginfast.c
+++ b/src/backend/access/gin/ginfast.c
@@ -31,6 +31,7 @@
#include "postmaster/autovacuum.h"
#include "storage/indexfsm.h"
#include "storage/lmgr.h"
+#include "storage/predicate.h"
#include "utils/builtins.h"
/* GUC parameter */
@@ -245,6 +246,13 @@ ginHeapTupleFastInsert(GinState *ginstate, GinTupleCollector *collector)
metabuffer = ReadBuffer(index, GIN_METAPAGE_BLKNO);
metapage = BufferGetPage(metabuffer);
+ /*
+ * An insertion to the pending list could logically belong anywhere in
+ * the tree, so it conflicts with all serializable scans. All scans
+ * acquire a predicate lock on the metabuffer to represent that.
+ */
+ CheckForSerializableConflictIn(index, NULL, metabuffer);
+
if (collector->sumsize + collector->ntuples * sizeof(ItemIdData) > GinListPageSize)
{
/*
diff --git a/src/backend/access/gin/ginget.c b/src/backend/access/gin/ginget.c
index 0e984166fa..ef3cd7dbe2 100644
--- a/src/backend/access/gin/ginget.c
+++ b/src/backend/access/gin/ginget.c
@@ -36,20 +36,6 @@ typedef struct pendingPosition
/*
- * Place predicate lock on GIN page if needed.
- */
-static void
-GinPredicateLockPage(Relation index, BlockNumber blkno, Snapshot snapshot)
-{
- /*
- * When fast update is on then no need in locking pages, because we
- * anyway need to lock the whole index.
- */
- if (!GinGetUseFastUpdate(index))
- PredicateLockPage(index, blkno, snapshot);
-}
-
-/*
* Goes to the next page if current offset is outside of bounds
*/
static bool
@@ -68,7 +54,7 @@ moveRightIfItNeeded(GinBtreeData *btree, GinBtreeStack *stack, Snapshot snapshot
stack->buffer = ginStepRight(stack->buffer, btree->index, GIN_SHARE);
stack->blkno = BufferGetBlockNumber(stack->buffer);
stack->off = FirstOffsetNumber;
- GinPredicateLockPage(btree->index, stack->blkno, snapshot);
+ PredicateLockPage(btree->index, stack->blkno, snapshot);
}
return true;
@@ -100,11 +86,6 @@ scanPostingTree(Relation index, GinScanEntry scanEntry,
*/
for (;;)
{
- /*
- * Predicate lock each leaf page in posting tree
- */
- GinPredicateLockPage(index, BufferGetBlockNumber(buffer), snapshot);
-
page = BufferGetPage(buffer);
if ((GinPageGetOpaque(page)->flags & GIN_DELETED) == 0)
{
@@ -158,7 +139,7 @@ collectMatchBitmap(GinBtreeData *btree, GinBtreeStack *stack,
* Predicate lock entry leaf page, following pages will be locked by
* moveRightIfItNeeded()
*/
- GinPredicateLockPage(btree->index, stack->buffer, snapshot);
+ PredicateLockPage(btree->index, stack->buffer, snapshot);
for (;;)
{
@@ -253,6 +234,13 @@ collectMatchBitmap(GinBtreeData *btree, GinBtreeStack *stack,
LockBuffer(stack->buffer, GIN_UNLOCK);
+ /*
+ * Acquire predicate lock on the posting tree. We already hold
+ * a lock on the entry page, but insertions to the posting tree
+ * don't check for conflicts on that level.
+ */
+ PredicateLockPage(btree->index, rootPostingTree, snapshot);
+
/* Collect all the TIDs in this entry's posting tree */
scanPostingTree(btree->index, scanEntry, rootPostingTree,
snapshot);
@@ -400,10 +388,6 @@ restartScanEntry:
{
IndexTuple itup = (IndexTuple) PageGetItem(page, PageGetItemId(page, stackEntry->off));
- /* Predicate lock visited entry leaf page */
- GinPredicateLockPage(ginstate->index,
- BufferGetBlockNumber(stackEntry->buffer), snapshot);
-
if (GinIsPostingTree(itup))
{
BlockNumber rootPostingTree = GinGetPostingTree(itup);
@@ -412,6 +396,13 @@ restartScanEntry:
ItemPointerData minItem;
/*
+ * This is an equality scan, so lock the root of the posting tree.
+ * It represents a lock on the exact key value, and covers all the
+ * items in the posting tree.
+ */
+ PredicateLockPage(ginstate->index, rootPostingTree, snapshot);
+
+ /*
* We should unlock entry page before touching posting tree to
* prevent deadlocks with vacuum processes. Because entry is never
* deleted from page and posting tree is never reduced to the
@@ -426,12 +417,6 @@ restartScanEntry:
entry->buffer = stack->buffer;
/*
- * Predicate lock visited posting tree page, following pages
- * will be locked by moveRightIfItNeeded or entryLoadMoreItems
- */
- GinPredicateLockPage(ginstate->index, BufferGetBlockNumber(entry->buffer), snapshot);
-
- /*
* We keep buffer pinned because we need to prevent deletion of
* page during scan. See GIN's vacuum implementation. RefCount is
* increased to keep buffer pinned after freeGinBtreeStack() call.
@@ -452,15 +437,38 @@ restartScanEntry:
freeGinBtreeStack(stack);
entry->isFinished = false;
}
- else if (GinGetNPosting(itup) > 0)
+ else
{
- entry->list = ginReadTuple(ginstate, entry->attnum, itup,
- &entry->nlist);
- entry->predictNumberResult = entry->nlist;
+ /*
+ * Lock the entry leaf page. This is more coarse-grained than
+ * necessary, because it will conflict with any insertions that
+ * land on the same leaf page, not only the exacty key we searched
+ * for. But locking an individual tuple would require updating
+ * that lock whenever it moves because of insertions or vacuums,
+ * which seems too complicated.
+ */
+ PredicateLockPage(ginstate->index,
+ BufferGetBlockNumber(stackEntry->buffer),
+ snapshot);
+ if (GinGetNPosting(itup) > 0)
+ {
+ entry->list = ginReadTuple(ginstate, entry->attnum, itup,
+ &entry->nlist);
+ entry->predictNumberResult = entry->nlist;
- entry->isFinished = false;
+ entry->isFinished = false;
+ }
}
}
+ else
+ {
+ /*
+ * No entry found. Predicate lock the leaf page, to lock the place
+ * where the entry would've been, had there been one.
+ */
+ PredicateLockPage(ginstate->index,
+ BufferGetBlockNumber(stackEntry->buffer), snapshot);
+ }
if (needUnlock)
LockBuffer(stackEntry->buffer, GIN_UNLOCK);
@@ -533,7 +541,7 @@ startScanKey(GinState *ginstate, GinScanOpaque so, GinScanKey key)
for (i = 0; i < key->nentries - 1; i++)
{
- /* Pass all entries <= i as false, and the rest as MAYBE */
+ /* Pass all entries <= i as FALSE, and the rest as MAYBE */
for (j = 0; j <= i; j++)
key->entryRes[entryIndexes[j]] = GIN_FALSE;
for (j = i + 1; j < key->nentries; j++)
@@ -673,8 +681,6 @@ entryLoadMoreItems(GinState *ginstate, GinScanEntry entry,
entry->btree.fullScan = false;
stack = ginFindLeafPage(&entry->btree, true, snapshot);
- GinPredicateLockPage(ginstate->index, BufferGetBlockNumber(stack->buffer), snapshot);
-
/* we don't need the stack, just the buffer. */
entry->buffer = stack->buffer;
IncrBufferRefCount(entry->buffer);
@@ -719,10 +725,6 @@ entryLoadMoreItems(GinState *ginstate, GinScanEntry entry,
entry->buffer = ginStepRight(entry->buffer,
ginstate->index,
GIN_SHARE);
-
- GinPredicateLockPage(ginstate->index, BufferGetBlockNumber(entry->buffer), snapshot);
-
-
page = BufferGetPage(entry->buffer);
}
stepright = true;
@@ -1084,8 +1086,8 @@ keyGetItem(GinState *ginstate, MemoryContext tempCtx, GinScanKey key,
* lossy page even when none of the other entries match.
*
* Our strategy is to call the tri-state consistent function, with the
- * lossy-page entries set to MAYBE, and all the other entries false. If it
- * returns false, none of the lossy items alone are enough for a match, so
+ * lossy-page entries set to MAYBE, and all the other entries FALSE. If it
+ * returns FALSE, none of the lossy items alone are enough for a match, so
* we don't need to return a lossy-page pointer. Otherwise, return a
* lossy-page pointer to indicate that the whole heap page must be
* checked. (On subsequent calls, we'll do nothing until minItem is past
@@ -1746,8 +1748,7 @@ collectMatchesForHeapRow(IndexScanDesc scan, pendingPosition *pos)
}
/*
- * Collect all matched rows from pending list into bitmap. Also function
- * takes PendingLockRelation if it's needed.
+ * Collect all matched rows from pending list into bitmap.
*/
static void
scanPendingInsert(IndexScanDesc scan, TIDBitmap *tbm, int64 *ntids)
@@ -1764,6 +1765,12 @@ scanPendingInsert(IndexScanDesc scan, TIDBitmap *tbm, int64 *ntids)
*ntids = 0;
+ /*
+ * Acquire predicate lock on the metapage, to conflict with any
+ * fastupdate insertions.
+ */
+ PredicateLockPage(scan->indexRelation, GIN_METAPAGE_BLKNO, scan->xs_snapshot);
+
LockBuffer(metabuffer, GIN_SHARE);
page = BufferGetPage(metabuffer);
TestForOldSnapshot(scan->xs_snapshot, scan->indexRelation, page);
@@ -1777,24 +1784,9 @@ scanPendingInsert(IndexScanDesc scan, TIDBitmap *tbm, int64 *ntids)
{
/* No pending list, so proceed with normal scan */
UnlockReleaseBuffer(metabuffer);
-
- /*
- * If fast update is enabled, we acquire a predicate lock on the entire
- * relation as fast update postpones the insertion of tuples into index
- * structure due to which we can't detect rw conflicts.
- */
- if (GinGetUseFastUpdate(scan->indexRelation))
- PredicateLockRelation(scan->indexRelation, scan->xs_snapshot);
-
return;
}
- /*
- * Pending list is not empty, we need to lock the index doesn't despite on
- * fastupdate state
- */
- PredicateLockRelation(scan->indexRelation, scan->xs_snapshot);
-
pos.pendingBuffer = ReadBuffer(scan->indexRelation, blkno);
LockBuffer(pos.pendingBuffer, GIN_SHARE);
pos.firstOffset = FirstOffsetNumber;
diff --git a/src/backend/access/gin/gininsert.c b/src/backend/access/gin/gininsert.c
index ec5eebb848..92c77015a1 100644
--- a/src/backend/access/gin/gininsert.c
+++ b/src/backend/access/gin/gininsert.c
@@ -104,7 +104,7 @@ addItemPointersToLeafTuple(GinState *ginstate,
buffer);
/* Now insert the TIDs-to-be-added into the posting tree */
- ginInsertItemPointers(ginstate->index, postingRoot,
+ ginInsertItemPointers(ginstate->index, postingRoot, InvalidBuffer,
items, nitem,
buildStats);
@@ -207,19 +207,23 @@ ginEntryInsert(GinState *ginstate,
{
/* add entries to existing posting tree */
BlockNumber rootPostingTree = GinGetPostingTree(itup);
+ Buffer rootBuffer;
/* release all stack */
LockBuffer(stack->buffer, GIN_UNLOCK);
freeGinBtreeStack(stack);
/* insert into posting tree */
- ginInsertItemPointers(ginstate->index, rootPostingTree,
+ rootBuffer = ReadBuffer(ginstate->index, rootPostingTree);
+ CheckForSerializableConflictIn(ginstate->index, NULL, rootBuffer);
+ ginInsertItemPointers(ginstate->index, rootPostingTree, rootBuffer,
items, nitem,
buildStats);
+ ReleaseBuffer(rootBuffer);
return;
}
- GinCheckForSerializableConflictIn(btree.index, NULL, stack->buffer);
+ CheckForSerializableConflictIn(ginstate->index, NULL, stack->buffer);
/* modify an existing leaf entry */
itup = addItemPointersToLeafTuple(ginstate, itup,
items, nitem, buildStats, stack->buffer);
@@ -228,7 +232,7 @@ ginEntryInsert(GinState *ginstate,
}
else
{
- GinCheckForSerializableConflictIn(btree.index, NULL, stack->buffer);
+ CheckForSerializableConflictIn(ginstate->index, NULL, stack->buffer);
/* no match, so construct a new leaf entry */
itup = buildFreshLeafTuple(ginstate, attnum, key, category,
items, nitem, buildStats, stack->buffer);
@@ -517,18 +521,6 @@ gininsert(Relation index, Datum *values, bool *isnull,
memset(&collector, 0, sizeof(GinTupleCollector));
- /*
- * With fastupdate on each scan and each insert begin with access to
- * pending list, so it effectively lock entire index. In this case
- * we aquire predicate lock and check for conflicts over index relation,
- * and hope that it will reduce locking overhead.
- *
- * Do not use GinCheckForSerializableConflictIn() here, because
- * it will do nothing (it does actual work only with fastupdate off).
- * Check for conflicts for entire index.
- */
- CheckForSerializableConflictIn(index, NULL, InvalidBuffer);
-
for (i = 0; i < ginstate->origTupdesc->natts; i++)
ginHeapTupleFastCollect(ginstate, &collector,
(OffsetNumber) (i + 1),
@@ -539,16 +531,6 @@ gininsert(Relation index, Datum *values, bool *isnull,
}
else
{
- GinStatsData stats;
-
- /*
- * Fastupdate is off but if pending list isn't empty then we need to
- * check conflicts with PredicateLockRelation in scanPendingInsert().
- */
- ginGetStats(index, &stats);
- if (stats.nPendingPages > 0)
- CheckForSerializableConflictIn(index, NULL, InvalidBuffer);
-
for (i = 0; i < ginstate->origTupdesc->natts; i++)
ginHeapTupleInsert(ginstate, (OffsetNumber) (i + 1),
values[i], isnull[i],
diff --git a/src/backend/access/gin/ginutil.c b/src/backend/access/gin/ginutil.c
index 4367523dd9..0a32182dd7 100644
--- a/src/backend/access/gin/ginutil.c
+++ b/src/backend/access/gin/ginutil.c
@@ -718,10 +718,3 @@ ginUpdateStats(Relation index, const GinStatsData *stats)
END_CRIT_SECTION();
}
-
-void
-GinCheckForSerializableConflictIn(Relation relation, HeapTuple tuple, Buffer buffer)
-{
- if (!GinGetUseFastUpdate(relation))
- CheckForSerializableConflictIn(relation, tuple, buffer);
-}
diff --git a/src/backend/access/gin/ginvacuum.c b/src/backend/access/gin/ginvacuum.c
index dd8e31b872..3104bc12b6 100644
--- a/src/backend/access/gin/ginvacuum.c
+++ b/src/backend/access/gin/ginvacuum.c
@@ -166,7 +166,6 @@ ginDeletePage(GinVacuumState *gvs, BlockNumber deleteBlkno, BlockNumber leftBlkn
START_CRIT_SECTION();
/* Unlink the page by changing left sibling's rightlink */
-
page = BufferGetPage(lBuffer);
GinPageGetOpaque(page)->rightlink = rightlink;
diff --git a/src/backend/access/gist/gist.c b/src/backend/access/gist/gist.c
index 9007d65ad2..048966924d 100644
--- a/src/backend/access/gist/gist.c
+++ b/src/backend/access/gist/gist.c
@@ -1220,7 +1220,7 @@ gistinserttuples(GISTInsertState *state, GISTInsertStack *stack,
bool is_split;
/*
- * Check for any rw conflicts (in serialisation isolation level)
+ * Check for any rw conflicts (in serializable isolation level)
* just before we intend to modify the page
*/
CheckForSerializableConflictIn(state->r, NULL, stack->buffer);
diff --git a/src/backend/storage/lmgr/README-SSI b/src/backend/storage/lmgr/README-SSI
index f2b099d1c9..50d2ecca9d 100644
--- a/src/backend/storage/lmgr/README-SSI
+++ b/src/backend/storage/lmgr/README-SSI
@@ -373,21 +373,22 @@ index *leaf* pages needed to lock the appropriate index range. If,
however, a search discovers that no root page has yet been created, a
predicate lock on the index relation is required.
+ * Like a B-tree, GIN searches acquire predicate locks only on the
+leaf pages of entry tree. When performing an equality scan, and an
+entry has a posting tree, the posting tree root is locked instead, to
+lock only that key value. However, fastupdate=on postpones the
+insertion of tuples into index structure by temporarily storing them
+into pending list. That makes us unable to detect r-w conflicts using
+page-level locks. To cope with that, insertions to the pending list
+conflict with all scans.
+
* GiST searches can determine that there are no matches at any
level of the index, so we acquire predicate lock at each index
level during a GiST search. An index insert at the leaf level can
then be trusted to ripple up to all levels and locations where
conflicting predicate locks may exist. In case there is a page split,
-we need to copy predicate lock from an original page to all new pages.
-
- * GIN searches acquire predicate locks only on the leaf pages
-of entry tree and posting tree. During a page split, a predicate locks are
-copied from the original page to the new page. In the same way predicate locks
-are copied from entry tree leaf page to freshly created posting tree root.
-However, when fast update is enabled, a predicate lock on the whole index
-relation is required. Fast update postpones the insertion of tuples into index
-structure by temporarily storing them into pending list. That makes us unable
-to detect r-w conflicts using page-level locks.
+we need to copy predicate lock from the original page to all the new
+pages.
* Hash index searches acquire predicate locks on the primary
page of a bucket. It acquires a lock on both the old and new buckets
@@ -395,7 +396,6 @@ for scans that happen concurrently with page splits. During a bucket
split, a predicate lock is copied from the primary page of an old
bucket to the primary page of a new bucket.
-
* The effects of page splits, overflows, consolidations, and
removals must be carefully reviewed to ensure that predicate locks
aren't "lost" during those operations, or kept with pages which could
diff --git a/src/include/access/gin_private.h b/src/include/access/gin_private.h
index d1df3033a6..1e2c9dde8b 100644
--- a/src/include/access/gin_private.h
+++ b/src/include/access/gin_private.h
@@ -103,8 +103,6 @@ extern Datum *ginExtractEntries(GinState *ginstate, OffsetNumber attnum,
extern OffsetNumber gintuple_get_attrnum(GinState *ginstate, IndexTuple tuple);
extern Datum gintuple_get_key(GinState *ginstate, IndexTuple tuple,
GinNullCategory *category);
-extern void GinCheckForSerializableConflictIn(Relation relation,
- HeapTuple tuple, Buffer buffer);
/* gininsert.c */
extern IndexBuildResult *ginbuild(Relation heap, Relation index,
@@ -161,6 +159,7 @@ typedef struct GinBtreeData
Relation index;
BlockNumber rootBlkno;
+ Buffer rootBuffer;
GinState *ginstate; /* not valid in a data scan */
bool fullScan;
bool isBuild;
@@ -222,12 +221,12 @@ extern BlockNumber createPostingTree(Relation index,
GinStatsData *buildStats, Buffer entrybuffer);
extern void GinDataPageAddPostingItem(Page page, PostingItem *data, OffsetNumber offset);
extern void GinPageDeletePostingItem(Page page, OffsetNumber offset);
-extern void ginInsertItemPointers(Relation index, BlockNumber rootBlkno,
+extern void ginInsertItemPointers(Relation index,
+ BlockNumber rootBlkno, Buffer rootBuffer,
ItemPointerData *items, uint32 nitem,
GinStatsData *buildStats);
extern GinBtreeStack *ginScanBeginPostingTree(GinBtree btree, Relation index, BlockNumber rootBlkno, Snapshot snapshot);
extern void ginDataFillRoot(GinBtree btree, Page root, BlockNumber lblkno, Page lpage, BlockNumber rblkno, Page rpage);
-extern void ginPrepareDataScan(GinBtree btree, Relation index, BlockNumber rootBlkno);
/*
* This is declared in ginvacuum.c, but is passed between ginVacuumItemPointers
diff --git a/src/test/isolation/expected/predicate-gin-fastupdate.out b/src/test/isolation/expected/predicate-gin-fastupdate.out
new file mode 100644
index 0000000000..7d4fa8e024
--- /dev/null
+++ b/src/test/isolation/expected/predicate-gin-fastupdate.out
@@ -0,0 +1,30 @@
+Parsed test spec with 3 sessions
+
+starting permutation: r1 r2 w1 c1 w2 c2
+step r1: SELECT count(*) FROM gin_tbl WHERE p @> array[1000];
+count
+
+2
+step r2: SELECT * FROM other_tbl;
+id
+
+step w1: INSERT INTO other_tbl VALUES (42);
+step c1: COMMIT;
+step w2: INSERT INTO gin_tbl SELECT array[1000,19001];
+ERROR: could not serialize access due to read/write dependencies among transactions
+step c2: COMMIT;
+
+starting permutation: r1 r2 w1 c1 fastupdate_on w2 c2
+step r1: SELECT count(*) FROM gin_tbl WHERE p @> array[1000];
+count
+
+2
+step r2: SELECT * FROM other_tbl;
+id
+
+step w1: INSERT INTO other_tbl VALUES (42);
+step c1: COMMIT;
+step fastupdate_on: ALTER INDEX ginidx SET (fastupdate = on);
+step w2: INSERT INTO gin_tbl SELECT array[1000,19001];
+ERROR: could not serialize access due to read/write dependencies among transactions
+step c2: COMMIT;
diff --git a/src/test/isolation/expected/predicate-gin-nomatch.out b/src/test/isolation/expected/predicate-gin-nomatch.out
new file mode 100644
index 0000000000..5e733262a4
--- /dev/null
+++ b/src/test/isolation/expected/predicate-gin-nomatch.out
@@ -0,0 +1,15 @@
+Parsed test spec with 2 sessions
+
+starting permutation: r1 r2 w1 c1 w2 c2
+step r1: SELECT count(*) FROM gin_tbl WHERE p @> array[-1];
+count
+
+0
+step r2: SELECT * FROM other_tbl;
+id
+
+step w1: INSERT INTO other_tbl VALUES (42);
+step c1: COMMIT;
+step w2: INSERT INTO gin_tbl SELECT array[-1];
+ERROR: could not serialize access due to read/write dependencies among transactions
+step c2: COMMIT;
diff --git a/src/test/isolation/expected/predicate-gin.out b/src/test/isolation/expected/predicate-gin.out
index 4f5501f6f0..bdf8911923 100644
--- a/src/test/isolation/expected/predicate-gin.out
+++ b/src/test/isolation/expected/predicate-gin.out
@@ -737,8 +737,8 @@ step c2: commit;
starting permutation: fu1 rxy1 rxy2fu wx1 c1 wy2fu c2
step fu1: alter index ginidx set (fastupdate = on);
commit;
- begin isolation level serializable;
- set enable_seqscan=off;
+ begin isolation level serializable;
+ set enable_seqscan=off;
step rxy1: select count(*) from gin_tbl where p @> array[4,5];
count
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index 6cb3d07240..5203ad582b 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -73,6 +73,8 @@ test: vacuum-concurrent-drop
test: predicate-hash
test: predicate-gist
test: predicate-gin
+test: predicate-gin-fastupdate
+test: predicate-gin-nomatch
test: partition-key-update-1
test: partition-key-update-2
test: partition-key-update-3
diff --git a/src/test/isolation/specs/predicate-gin-fastupdate.spec b/src/test/isolation/specs/predicate-gin-fastupdate.spec
new file mode 100644
index 0000000000..04b8036fc5
--- /dev/null
+++ b/src/test/isolation/specs/predicate-gin-fastupdate.spec
@@ -0,0 +1,49 @@
+#
+# Test that predicate locking on a GIN index works correctly, even if
+# fastupdate is turned on concurrently.
+#
+# 0. fastupdate is off
+# 1. Session 's1' acquires predicate lock on page X
+# 2. fastupdate is turned on
+# 3. Session 's2' inserts a new tuple to the pending list
+#
+# This test tests that if the lock acquired in step 1 would conflict with
+# the scan in step 1, we detect that conflict correctly, even if fastupdate
+# was turned on in-between.
+#
+setup
+{
+ create table gin_tbl(p int4[]);
+ insert into gin_tbl select array[g, g*2,g*3] from generate_series(1, 10000) g;
+ insert into gin_tbl select array[4,5,6] from generate_series(10001, 20000) g;
+ create index ginidx on gin_tbl using gin(p) with (fastupdate = off);
+
+ create table other_tbl (id int4);
+}
+
+teardown
+{
+ drop table gin_tbl;
+ drop table other_tbl;
+}
+
+session "s1"
+setup { BEGIN ISOLATION LEVEL SERIALIZABLE; SET enable_seqscan=off; }
+step "r1" { SELECT count(*) FROM gin_tbl WHERE p @> array[1000]; }
+step "w1" { INSERT INTO other_tbl VALUES (42); }
+step "c1" { COMMIT; }
+
+session "s2"
+setup { BEGIN ISOLATION LEVEL SERIALIZABLE; SET enable_seqscan=off; }
+step "r2" { SELECT * FROM other_tbl; }
+step "w2" { INSERT INTO gin_tbl SELECT array[1000,19001]; }
+step "c2" { COMMIT; }
+
+session "s3"
+step "fastupdate_on" { ALTER INDEX ginidx SET (fastupdate = on); }
+
+# This correctly throws serialization failure.
+permutation "r1" "r2" "w1" "c1" "w2" "c2"
+
+# But if fastupdate is turned on in the middle, we miss it.
+permutation "r1" "r2" "w1" "c1" "fastupdate_on" "w2" "c2"
diff --git a/src/test/isolation/specs/predicate-gin-nomatch.spec b/src/test/isolation/specs/predicate-gin-nomatch.spec
new file mode 100644
index 0000000000..0ad456cb14
--- /dev/null
+++ b/src/test/isolation/specs/predicate-gin-nomatch.spec
@@ -0,0 +1,35 @@
+#
+# Check that GIN index grabs an appropriate lock, even if there is no match.
+#
+setup
+{
+ create table gin_tbl(p int4[]);
+ insert into gin_tbl select array[g, g*2,g*3] from generate_series(1, 10000) g;
+ insert into gin_tbl select array[4,5,6] from generate_series(10001, 20000) g;
+ create index ginidx on gin_tbl using gin(p) with (fastupdate = off);
+
+ create table other_tbl (id int4);
+}
+
+teardown
+{
+ drop table gin_tbl;
+ drop table other_tbl;
+}
+
+session "s1"
+setup { BEGIN ISOLATION LEVEL SERIALIZABLE; SET enable_seqscan=off; }
+# Scan with no match.
+step "r1" { SELECT count(*) FROM gin_tbl WHERE p @> array[-1]; }
+step "w1" { INSERT INTO other_tbl VALUES (42); }
+step "c1" { COMMIT; }
+
+session "s2"
+setup { BEGIN ISOLATION LEVEL SERIALIZABLE; SET enable_seqscan=off; }
+step "r2" { SELECT * FROM other_tbl; }
+# Insert row that would've matched in step "r1"
+step "w2" { INSERT INTO gin_tbl SELECT array[-1]; }
+step "c2" { COMMIT; }
+
+# This should throw serialization failure.
+permutation "r1" "r2" "w1" "c1" "w2" "c2"
diff --git a/src/test/isolation/specs/predicate-gin.spec b/src/test/isolation/specs/predicate-gin.spec
index 9f0cda8057..a967695867 100644
--- a/src/test/isolation/specs/predicate-gin.spec
+++ b/src/test/isolation/specs/predicate-gin.spec
@@ -32,8 +32,8 @@ setup
# enable pending list for a small subset of tests
step "fu1" { alter index ginidx set (fastupdate = on);
commit;
- begin isolation level serializable;
- set enable_seqscan=off; }
+ begin isolation level serializable;
+ set enable_seqscan=off; }
step "rxy1" { select count(*) from gin_tbl where p @> array[4,5]; }
step "wx1" { insert into gin_tbl select g, array[5,6] from generate_series
--
2.11.0