Re: Inval reliability, especially for inplace updates

Noah Misch Sat, 10 May 2025 08:14:46 -0700

On Sun, Nov 03, 2024 at 10:29:25AM -0800, Noah Misch wrote:
> On Fri, Nov 01, 2024 at 04:38:29PM -0700, Noah Misch wrote:
> > This was a near miss to having a worst-in-years regression in a minor 
> > release,
> > so I'm proposing this sequence:
> > 
> > - Revert from non-master branches commits 8e7e672 (inplace180, "WAL-log
> >   inplace update before revealing it to other sessions.") and 243e9b4
> >   (inplace160, "For inplace update, send nontransactional invalidations.").
> > 
> > - Back-patch inplace230-index_update_stats-io-before-buflock to harden 
> > commit
> >   a07e03f (inplace110, "Fix data loss at inplace update after 
> > heap_update()").
> > 
> > - Push attached inplace240 to master.
> > 
> > - Make the commitfest entry a request for review of v17 
> > inplace160+inplace240.
> >   After some amount of additional review and master bake time, the reverted
> >   patches would return to non-master branches.


> Pushed as 0bada39.

> I'm attaching the v17 patch that now becomes the commitfest submission

That still applies cleanly.  I'm adding a back-patch of commit f4ece89, which
adds assertions intended to build confidence about the main patch.  The
commitfest entry requests a review of the first patch only, but the second
patch may answer questions that a review of the first would otherwise raise.

Author:     Noah Misch <[email protected]>
Commit:     Noah Misch <[email protected]>

    For inplace update, send nontransactional invalidations.
    
    The inplace update survives ROLLBACK.  The inval didn't, so another
    backend's DDL could then update the row without incorporating the
    inplace update.  In the test this fixes, a mix of CREATE INDEX and ALTER
    TABLE resulted in a table with an index, yet relhasindex=f.  That is a
    source of index corruption.
    
    Back-patch to v17 - v13 (all supported versions).  This is a back-patch
    of commits 243e9b40f1b2dd09d6e5bf91ebf6e822a2cd3704 and
    0bada39c83a150079567a6e97b1a25a198f30ea3.  It reverses commit
    c1099dd745b0135960895caa8892a1873ac6cbe5, my revert of the original
    back-patch of 243e9b4.  Like the original back-patch, this doesn't
    change WAL, because these branches use end-of-recovery SIResetAll().
    All branches change the ABI of extern function
    PrepareToInvalidateCacheTuple().  No PGXN extension calls that, and
    there's no apparent use case in extensions.
    
    Reviewed by FIXME, (in earlier versions) Nitin Motiani, and (in earlier
    versions) Andres Freund.
    
    Discussion: https://postgr.es/m/[email protected]

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index bbe64b1..bf2bdac 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -6232,6 +6232,17 @@ heap_inplace_lock(Relation relation,
 
        Assert(BufferIsValid(buffer));
 
+       /*
+        * Construct shared cache inval if necessary.  Because we pass a tuple
+        * version without our own inplace changes or inplace changes other
+        * sessions complete while we wait for locks, inplace update mustn't
+        * change catcache lookup keys.  But we aren't bothering with index
+        * updates either, so that's true a fortiori.  After LockBuffer(), it
+        * would be too late, because this might reach a
+        * CatalogCacheInitializeCache() that locks "buffer".
+        */
+       CacheInvalidateHeapTupleInplace(relation, oldtup_ptr, NULL);
+
        LockTuple(relation, &oldtup.t_self, InplaceUpdateTupleLock);
        LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
 
@@ -6327,6 +6338,7 @@ heap_inplace_lock(Relation relation,
        if (!ret)
        {
                UnlockTuple(relation, &oldtup.t_self, InplaceUpdateTupleLock);
+               ForgetInplace_Inval();
                InvalidateCatalogSnapshot();
        }
        return ret;
@@ -6355,6 +6367,16 @@ heap_inplace_update_and_unlock(Relation relation,
        if (oldlen != newlen || htup->t_hoff != tuple->t_data->t_hoff)
                elog(ERROR, "wrong tuple length");
 
+       /*
+        * Unlink relcache init files as needed.  If unlinking, acquire
+        * RelCacheInitLock until after associated invalidations.  By doing this
+        * in advance, if we checkpoint and then crash between inplace
+        * XLogInsert() and inval, we don't rely on StartupXLOG() ->
+        * RelationCacheInitFileRemove().  That uses elevel==LOG, so replay 
would
+        * neglect to PANIC on EIO.
+        */
+       PreInplace_Inval();
+
        /* NO EREPORT(ERROR) from here till changes are logged */
        START_CRIT_SECTION();
 
@@ -6398,17 +6420,28 @@ heap_inplace_update_and_unlock(Relation relation,
                PageSetLSN(BufferGetPage(buffer), recptr);
        }
 
+       LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
+
+       /*
+        * Send invalidations to shared queue.  SearchSysCacheLocked1() assumes 
we
+        * do this before UnlockTuple().
+        *
+        * If we're mutating a tuple visible only to this transaction, there's 
an
+        * equivalent transactional inval from the action that created the 
tuple,
+        * and this inval is superfluous.
+        */
+       AtInplace_Inval();
+
        END_CRIT_SECTION();
+       UnlockTuple(relation, &tuple->t_self, InplaceUpdateTupleLock);
 
-       heap_inplace_unlock(relation, oldtup, buffer);
+       AcceptInvalidationMessages();   /* local processing of just-sent inval 
*/
 
        /*
-        * Send out shared cache inval if necessary.  Note that because we only
-        * pass the new version of the tuple, this mustn't be used for any
-        * operations that could change catcache lookup keys.  But we aren't
-        * bothering with index updates either, so that's true a fortiori.
-        *
-        * XXX ROLLBACK discards the invalidation.  See test inplace-inval.spec.
+        * Queue a transactional inval.  The immediate invalidation we just sent
+        * is the only one known to be necessary.  To reduce risk from the
+        * transition to immediate invalidation, continue sending a 
transactional
+        * invalidation like we've long done.  Third-party code might rely on 
it.
         */
        if (!IsBootstrapProcessingMode())
                CacheInvalidateHeapTuple(relation, tuple, NULL);
@@ -6423,6 +6456,7 @@ heap_inplace_unlock(Relation relation,
 {
        LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
        UnlockTuple(relation, &oldtup->t_self, InplaceUpdateTupleLock);
+       ForgetInplace_Inval();
 }
 
 /*
diff --git a/src/backend/access/transam/xact.c 
b/src/backend/access/transam/xact.c
index 4cecf63..053a200 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -1358,14 +1358,24 @@ RecordTransactionCommit(void)
 
                /*
                 * Transactions without an assigned xid can contain invalidation
-                * messages (e.g. explicit relcache invalidations or catcache
-                * invalidations for inplace updates); standbys need to process 
those.
-                * We can't emit a commit record without an xid, and we don't 
want to
-                * force assigning an xid, because that'd be problematic for 
e.g.
-                * vacuum.  Hence we emit a bespoke record for the 
invalidations. We
-                * don't want to use that in case a commit record is emitted, 
so they
-                * happen synchronously with commits (besides not wanting to 
emit more
-                * WAL records).
+                * messages.  While inplace updates do this, this is not known 
to be
+                * necessary; see comment at inplace CacheInvalidateHeapTuple().
+                * Extensions might still rely on this capability, and standbys 
may
+                * need to process those invals.  We can't emit a commit record
+                * without an xid, and we don't want to force assigning an xid,
+                * because that'd be problematic for e.g. vacuum.  Hence we 
emit a
+                * bespoke record for the invalidations. We don't want to use 
that in
+                * case a commit record is emitted, so they happen 
synchronously with
+                * commits (besides not wanting to emit more WAL records).
+                *
+                * XXX Every known use of this capability is a defect.  Since 
an XID
+                * isn't controlling visibility of the change that prompted 
invals,
+                * other sessions need the inval even if this transactions 
aborts.
+                *
+                * ON COMMIT DELETE ROWS does a nontransactional index_build(), 
which
+                * queues a relcache inval, including in transactions without 
an xid
+                * that had read the (empty) table.  Standbys don't need any ON 
COMMIT
+                * DELETE ROWS invals, but we've not done the work to withhold 
them.
                 */
                if (nmsgs != 0)
                {
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 6c47b3c..5a1de05 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -2905,12 +2905,19 @@ index_update_stats(Relation rel,
        if (dirty)
        {
                systable_inplace_update_finish(state, tuple);
-               /* the above sends a cache inval message */
+               /* the above sends transactional and immediate cache inval 
messages */
        }
        else
        {
                systable_inplace_update_cancel(state);
-               /* no need to change tuple, but force relcache inval anyway */
+
+               /*
+                * While we didn't change relhasindex, CREATE INDEX needs a
+                * transactional inval for when the new index's catalog rows 
become
+                * visible.  Other CREATE INDEX and REINDEX code happens to 
also queue
+                * this inval, but keep this in case rare callers rely on this 
part of
+                * our API contract.
+                */
                CacheInvalidateRelcacheByTuple(tuple);
        }
 
diff --git a/src/backend/commands/event_trigger.c 
b/src/backend/commands/event_trigger.c
index 05a6de6..a586d24 100644
--- a/src/backend/commands/event_trigger.c
+++ b/src/backend/commands/event_trigger.c
@@ -975,11 +975,6 @@ EventTriggerOnLogin(void)
                                 * this instead of regular updates serves two 
purposes. First,
                                 * that avoids possible waiting on the 
row-level lock. Second,
                                 * that avoids dealing with TOAST.
-                                *
-                                * Changes made by inplace update may be lost 
due to
-                                * concurrent normal updates; see 
inplace-inval.spec. However,
-                                * we are OK with that.  The subsequent 
connections will still
-                                * have a chance to set "dathasloginevt" to 
false.
                                 */
                                systable_inplace_update_finish(state, tuple);
                        }
diff --git a/src/backend/replication/logical/decode.c 
b/src/backend/replication/logical/decode.c
index 8ec5adf..03f1617 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -508,23 +508,19 @@ heap_decode(LogicalDecodingContext *ctx, XLogRecordBuffer 
*buf)
 
                        /*
                         * Inplace updates are only ever performed on catalog 
tuples and
-                        * can, per definition, not change tuple visibility.  
Since we
-                        * don't decode catalog tuples, we're not interested in 
the
-                        * record's contents.
+                        * can, per definition, not change tuple visibility.  
Inplace
+                        * updates don't affect storage or interpretation of 
table rows,
+                        * so they don't affect logicalrep_write_tuple() 
outcomes.  Hence,
+                        * we don't process invalidations from the original 
operation.  If
+                        * inplace updates did affect those things, 
invalidations wouldn't
+                        * make it work, since there are no snapshot-specific 
versions of
+                        * inplace-updated values.  Since we also don't decode 
catalog
+                        * tuples, we're not interested in the record's 
contents.
                         *
-                        * In-place updates can be used either by XID-bearing 
transactions
-                        * (e.g.  in CREATE INDEX CONCURRENTLY) or by XID-less
-                        * transactions (e.g.  VACUUM).  In the former case, 
the commit
-                        * record will include cache invalidations, so we mark 
the
-                        * transaction as catalog modifying here. Currently 
that's
-                        * redundant because the commit will do that as well, 
but once we
-                        * support decoding in-progress relations, this will be 
important.
+                        * WAL contains likely-unnecessary commit-time invals 
from the
+                        * CacheInvalidateHeapTuple() call in 
heap_inplace_update().
+                        * Excess invalidation is safe.
                         */
-                       if (!TransactionIdIsValid(xid))
-                               break;
-
-                       (void) SnapBuildProcessChange(builder, xid, 
buf->origptr);
-                       ReorderBufferXidSetCatalogChanges(ctx->reorder, xid, 
buf->origptr);
                        break;
 
                case XLOG_HEAP_CONFIRM:
diff --git a/src/backend/utils/cache/catcache.c 
b/src/backend/utils/cache/catcache.c
index 111d8a2..ea8ca0e 100644
--- a/src/backend/utils/cache/catcache.c
+++ b/src/backend/utils/cache/catcache.c
@@ -2288,7 +2288,8 @@ void
 PrepareToInvalidateCacheTuple(Relation relation,
                                                          HeapTuple tuple,
                                                          HeapTuple newtuple,
-                                                         void (*function) 
(int, uint32, Oid))
+                                                         void (*function) 
(int, uint32, Oid, void *),
+                                                         void *context)
 {
        slist_iter      iter;
        Oid                     reloid;
@@ -2329,7 +2330,7 @@ PrepareToInvalidateCacheTuple(Relation relation,
                hashvalue = CatalogCacheComputeTupleHashValue(ccp, 
ccp->cc_nkeys, tuple);
                dbid = ccp->cc_relisshared ? (Oid) 0 : MyDatabaseId;
 
-               (*function) (ccp->id, hashvalue, dbid);
+               (*function) (ccp->id, hashvalue, dbid, context);
 
                if (newtuple)
                {
@@ -2338,7 +2339,7 @@ PrepareToInvalidateCacheTuple(Relation relation,
                        newhashvalue = CatalogCacheComputeTupleHashValue(ccp, 
ccp->cc_nkeys, newtuple);
 
                        if (newhashvalue != hashvalue)
-                               (*function) (ccp->id, newhashvalue, dbid);
+                               (*function) (ccp->id, newhashvalue, dbid, 
context);
                }
        }
 }
diff --git a/src/backend/utils/cache/inval.c b/src/backend/utils/cache/inval.c
index 603aa41..8e311ca 100644
--- a/src/backend/utils/cache/inval.c
+++ b/src/backend/utils/cache/inval.c
@@ -94,6 +94,10 @@
  *     worth trying to avoid sending such inval traffic in the future, if those
  *     problems can be overcome cheaply.
  *
+ *     When making a nontransactional change to a cacheable object, we must
+ *     likewise send the invalidation immediately, before ending the change's
+ *     critical section.  This includes inplace heap updates, relmap, and smgr.
+ *
  *     When wal_level=logical, write invalidations into WAL at each command 
end to
  *     support the decoding of the in-progress transactions.  See
  *     CommandEndInvalidationMessages.
@@ -130,13 +134,15 @@
 
 /*
  * Pending requests are stored as ready-to-send SharedInvalidationMessages.
- * We keep the messages themselves in arrays in TopTransactionContext
- * (there are separate arrays for catcache and relcache messages).  Control
- * information is kept in a chain of TransInvalidationInfo structs, also
- * allocated in TopTransactionContext.  (We could keep a subtransaction's
- * TransInvalidationInfo in its CurTransactionContext; but that's more
- * wasteful not less so, since in very many scenarios it'd be the only
- * allocation in the subtransaction's CurTransactionContext.)
+ * We keep the messages themselves in arrays in TopTransactionContext (there
+ * are separate arrays for catcache and relcache messages).  For transactional
+ * messages, control information is kept in a chain of TransInvalidationInfo
+ * structs, also allocated in TopTransactionContext.  (We could keep a
+ * subtransaction's TransInvalidationInfo in its CurTransactionContext; but
+ * that's more wasteful not less so, since in very many scenarios it'd be the
+ * only allocation in the subtransaction's CurTransactionContext.)  For
+ * inplace update messages, control information appears in an
+ * InvalidationInfo, allocated in CurrentMemoryContext.
  *
  * We can store the message arrays densely, and yet avoid moving data around
  * within an array, because within any one subtransaction we need only
@@ -147,7 +153,9 @@
  * struct.  Similarly, we need distinguish messages of prior subtransactions
  * from those of the current subtransaction only until the subtransaction
  * completes, after which we adjust the array indexes in the parent's
- * TransInvalidationInfo to include the subtransaction's messages.
+ * TransInvalidationInfo to include the subtransaction's messages.  Inplace
+ * invalidations don't need a concept of command or subtransaction boundaries,
+ * since we send them during the WAL insertion critical section.
  *
  * The ordering of the individual messages within a command's or
  * subtransaction's output is not considered significant, although this
@@ -200,7 +208,7 @@ typedef struct InvalidationMsgsGroup
 
 
 /*----------------
- * Invalidation messages are divided into two groups:
+ * Transactional invalidation messages are divided into two groups:
  *     1) events so far in current command, not yet reflected to caches.
  *     2) events in previous commands of current transaction; these have
  *        been reflected to local caches, and must be either broadcast to
@@ -216,26 +224,36 @@ typedef struct InvalidationMsgsGroup
  *----------------
  */
 
-typedef struct TransInvalidationInfo
+/* fields common to both transactional and inplace invalidation */
+typedef struct InvalidationInfo
 {
-       /* Back link to parent transaction's info */
-       struct TransInvalidationInfo *parent;
-
-       /* Subtransaction nesting depth */
-       int                     my_level;
-
        /* Events emitted by current command */
        InvalidationMsgsGroup CurrentCmdInvalidMsgs;
 
+       /* init file must be invalidated? */
+       bool            RelcacheInitFileInval;
+} InvalidationInfo;
+
+/* subclass adding fields specific to transactional invalidation */
+typedef struct TransInvalidationInfo
+{
+       /* Base class */
+       struct InvalidationInfo ii;
+
        /* Events emitted by previous commands of this (sub)transaction */
        InvalidationMsgsGroup PriorCmdInvalidMsgs;
 
-       /* init file must be invalidated? */
-       bool            RelcacheInitFileInval;
+       /* Back link to parent transaction's info */
+       struct TransInvalidationInfo *parent;
+
+       /* Subtransaction nesting depth */
+       int                     my_level;
 } TransInvalidationInfo;
 
 static TransInvalidationInfo *transInvalInfo = NULL;
 
+static InvalidationInfo *inplaceInvalInfo = NULL;
+
 /* GUC storage */
 int                    debug_discard_caches = 0;
 
@@ -543,9 +561,12 @@ ProcessInvalidationMessagesMulti(InvalidationMsgsGroup 
*group,
 static void
 RegisterCatcacheInvalidation(int cacheId,
                                                         uint32 hashValue,
-                                                        Oid dbId)
+                                                        Oid dbId,
+                                                        void *context)
 {
-       AddCatcacheInvalidationMessage(&transInvalInfo->CurrentCmdInvalidMsgs,
+       InvalidationInfo *info = (InvalidationInfo *) context;
+
+       AddCatcacheInvalidationMessage(&info->CurrentCmdInvalidMsgs,
                                                                   cacheId, 
hashValue, dbId);
 }
 
@@ -555,10 +576,9 @@ RegisterCatcacheInvalidation(int cacheId,
  * Register an invalidation event for all catcache entries from a catalog.
  */
 static void
-RegisterCatalogInvalidation(Oid dbId, Oid catId)
+RegisterCatalogInvalidation(InvalidationInfo *info, Oid dbId, Oid catId)
 {
-       AddCatalogInvalidationMessage(&transInvalInfo->CurrentCmdInvalidMsgs,
-                                                                 dbId, catId);
+       AddCatalogInvalidationMessage(&info->CurrentCmdInvalidMsgs, dbId, 
catId);
 }
 
 /*
@@ -567,10 +587,9 @@ RegisterCatalogInvalidation(Oid dbId, Oid catId)
  * As above, but register a relcache invalidation event.
  */
 static void
-RegisterRelcacheInvalidation(Oid dbId, Oid relId)
+RegisterRelcacheInvalidation(InvalidationInfo *info, Oid dbId, Oid relId)
 {
-       AddRelcacheInvalidationMessage(&transInvalInfo->CurrentCmdInvalidMsgs,
-                                                                  dbId, relId);
+       AddRelcacheInvalidationMessage(&info->CurrentCmdInvalidMsgs, dbId, 
relId);
 
        /*
         * Most of the time, relcache invalidation is associated with system
@@ -587,7 +606,7 @@ RegisterRelcacheInvalidation(Oid dbId, Oid relId)
         * as well.  Also zap when we are invalidating whole relcache.
         */
        if (relId == InvalidOid || RelationIdIsInInitFile(relId))
-               transInvalInfo->RelcacheInitFileInval = true;
+               info->RelcacheInitFileInval = true;
 }
 
 /*
@@ -597,24 +616,27 @@ RegisterRelcacheInvalidation(Oid dbId, Oid relId)
  * Only needed for catalogs that don't have catcaches.
  */
 static void
-RegisterSnapshotInvalidation(Oid dbId, Oid relId)
+RegisterSnapshotInvalidation(InvalidationInfo *info, Oid dbId, Oid relId)
 {
-       AddSnapshotInvalidationMessage(&transInvalInfo->CurrentCmdInvalidMsgs,
-                                                                  dbId, relId);
+       AddSnapshotInvalidationMessage(&info->CurrentCmdInvalidMsgs, dbId, 
relId);
 }
 
 /*
  * PrepareInvalidationState
  *             Initialize inval data for the current (sub)transaction.
  */
-static void
+static InvalidationInfo *
 PrepareInvalidationState(void)
 {
        TransInvalidationInfo *myInfo;
 
+       Assert(IsTransactionState());
+       /* Can't queue transactional message while collecting inplace messages. 
*/
+       Assert(inplaceInvalInfo == NULL);
+
        if (transInvalInfo != NULL &&
                transInvalInfo->my_level == GetCurrentTransactionNestLevel())
-               return;
+               return (InvalidationInfo *) transInvalInfo;
 
        myInfo = (TransInvalidationInfo *)
                MemoryContextAllocZero(TopTransactionContext,
@@ -637,7 +659,7 @@ PrepareInvalidationState(void)
                 * counter.  This is a convenient place to check for that, as 
well as
                 * being important to keep management of the message arrays 
simple.
                 */
-               if (NumMessagesInGroup(&transInvalInfo->CurrentCmdInvalidMsgs) 
!= 0)
+               if 
(NumMessagesInGroup(&transInvalInfo->ii.CurrentCmdInvalidMsgs) != 0)
                        elog(ERROR, "cannot start a subtransaction when there 
are unprocessed inval messages");
 
                /*
@@ -646,8 +668,8 @@ PrepareInvalidationState(void)
                 * to update them to follow whatever is already in the arrays.
                 */
                SetGroupToFollow(&myInfo->PriorCmdInvalidMsgs,
-                                                
&transInvalInfo->CurrentCmdInvalidMsgs);
-               SetGroupToFollow(&myInfo->CurrentCmdInvalidMsgs,
+                                                
&transInvalInfo->ii.CurrentCmdInvalidMsgs);
+               SetGroupToFollow(&myInfo->ii.CurrentCmdInvalidMsgs,
                                                 &myInfo->PriorCmdInvalidMsgs);
        }
        else
@@ -663,6 +685,41 @@ PrepareInvalidationState(void)
        }
 
        transInvalInfo = myInfo;
+       return (InvalidationInfo *) myInfo;
+}
+
+/*
+ * PrepareInplaceInvalidationState
+ *             Initialize inval data for an inplace update.
+ *
+ * See previous function for more background.
+ */
+static InvalidationInfo *
+PrepareInplaceInvalidationState(void)
+{
+       InvalidationInfo *myInfo;
+
+       Assert(IsTransactionState());
+       /* limit of one inplace update under assembly */
+       Assert(inplaceInvalInfo == NULL);
+
+       /* gone after WAL insertion CritSection ends, so use current context */
+       myInfo = (InvalidationInfo *) palloc0(sizeof(InvalidationInfo));
+
+       /* Stash our messages past end of the transactional messages, if any. */
+       if (transInvalInfo != NULL)
+               SetGroupToFollow(&myInfo->CurrentCmdInvalidMsgs,
+                                                
&transInvalInfo->ii.CurrentCmdInvalidMsgs);
+       else
+       {
+               InvalMessageArrays[CatCacheMsgs].msgs = NULL;
+               InvalMessageArrays[CatCacheMsgs].maxmsgs = 0;
+               InvalMessageArrays[RelCacheMsgs].msgs = NULL;
+               InvalMessageArrays[RelCacheMsgs].maxmsgs = 0;
+       }
+
+       inplaceInvalInfo = myInfo;
+       return myInfo;
 }
 
 /* ----------------------------------------------------------------
@@ -902,7 +959,7 @@ 
xactGetCommittedInvalidationMessages(SharedInvalidationMessage **msgs,
         * after we send the SI messages.  However, we need not do anything 
unless
         * we committed.
         */
-       *RelcacheInitFileInval = transInvalInfo->RelcacheInitFileInval;
+       *RelcacheInitFileInval = transInvalInfo->ii.RelcacheInitFileInval;
 
        /*
         * Collect all the pending messages into a single contiguous array of
@@ -913,7 +970,7 @@ 
xactGetCommittedInvalidationMessages(SharedInvalidationMessage **msgs,
         * not new ones.
         */
        nummsgs = NumMessagesInGroup(&transInvalInfo->PriorCmdInvalidMsgs) +
-               NumMessagesInGroup(&transInvalInfo->CurrentCmdInvalidMsgs);
+               NumMessagesInGroup(&transInvalInfo->ii.CurrentCmdInvalidMsgs);
 
        *msgs = msgarray = (SharedInvalidationMessage *)
                MemoryContextAlloc(CurTransactionContext,
@@ -926,7 +983,7 @@ 
xactGetCommittedInvalidationMessages(SharedInvalidationMessage **msgs,
                                                                                
msgs,
                                                                                
n * sizeof(SharedInvalidationMessage)),
                                                                 nmsgs += n));
-       ProcessMessageSubGroupMulti(&transInvalInfo->CurrentCmdInvalidMsgs,
+       ProcessMessageSubGroupMulti(&transInvalInfo->ii.CurrentCmdInvalidMsgs,
                                                                CatCacheMsgs,
                                                                
(memcpy(msgarray + nmsgs,
                                                                                
msgs,
@@ -938,7 +995,7 @@ 
xactGetCommittedInvalidationMessages(SharedInvalidationMessage **msgs,
                                                                                
msgs,
                                                                                
n * sizeof(SharedInvalidationMessage)),
                                                                 nmsgs += n));
-       ProcessMessageSubGroupMulti(&transInvalInfo->CurrentCmdInvalidMsgs,
+       ProcessMessageSubGroupMulti(&transInvalInfo->ii.CurrentCmdInvalidMsgs,
                                                                RelCacheMsgs,
                                                                
(memcpy(msgarray + nmsgs,
                                                                                
msgs,
@@ -1024,7 +1081,9 @@ 
ProcessCommittedInvalidationMessages(SharedInvalidationMessage *msgs,
 void
 AtEOXact_Inval(bool isCommit)
 {
-       /* Quick exit if no messages */
+       inplaceInvalInfo = NULL;
+
+       /* Quick exit if no transactional messages */
        if (transInvalInfo == NULL)
                return;
 
@@ -1038,16 +1097,16 @@ AtEOXact_Inval(bool isCommit)
                 * after we send the SI messages.  However, we need not do 
anything
                 * unless we committed.
                 */
-               if (transInvalInfo->RelcacheInitFileInval)
+               if (transInvalInfo->ii.RelcacheInitFileInval)
                        RelationCacheInitFilePreInvalidate();
 
                AppendInvalidationMessages(&transInvalInfo->PriorCmdInvalidMsgs,
-                                                                  
&transInvalInfo->CurrentCmdInvalidMsgs);
+                                                                  
&transInvalInfo->ii.CurrentCmdInvalidMsgs);
 
                
ProcessInvalidationMessagesMulti(&transInvalInfo->PriorCmdInvalidMsgs,
                                                                                
 SendSharedInvalidMessages);
 
-               if (transInvalInfo->RelcacheInitFileInval)
+               if (transInvalInfo->ii.RelcacheInitFileInval)
                        RelationCacheInitFilePostInvalidate();
        }
        else
@@ -1061,6 +1120,56 @@ AtEOXact_Inval(bool isCommit)
 }
 
 /*
+ * PreInplace_Inval
+ *             Process queued-up invalidation before inplace update critical 
section.
+ *
+ * Tasks belong here if they are safe even if the inplace update does not
+ * complete.  Currently, this just unlinks a cache file, which can fail.  The
+ * sum of this and AtInplace_Inval() mirrors AtEOXact_Inval(isCommit=true).
+ */
+void
+PreInplace_Inval(void)
+{
+       Assert(CritSectionCount == 0);
+
+       if (inplaceInvalInfo && inplaceInvalInfo->RelcacheInitFileInval)
+               RelationCacheInitFilePreInvalidate();
+}
+
+/*
+ * AtInplace_Inval
+ *             Process queued-up invalidations after inplace update buffer 
mutation.
+ */
+void
+AtInplace_Inval(void)
+{
+       Assert(CritSectionCount > 0);
+
+       if (inplaceInvalInfo == NULL)
+               return;
+
+       
ProcessInvalidationMessagesMulti(&inplaceInvalInfo->CurrentCmdInvalidMsgs,
+                                                                        
SendSharedInvalidMessages);
+
+       if (inplaceInvalInfo->RelcacheInitFileInval)
+               RelationCacheInitFilePostInvalidate();
+
+       inplaceInvalInfo = NULL;
+}
+
+/*
+ * ForgetInplace_Inval
+ *             Alternative to PreInplace_Inval()+AtInplace_Inval(): discard 
queued-up
+ *             invalidations.  This lets inplace update enumerate invalidations
+ *             optimistically, before locking the buffer.
+ */
+void
+ForgetInplace_Inval(void)
+{
+       inplaceInvalInfo = NULL;
+}
+
+/*
  * AtEOSubXact_Inval
  *             Process queued-up invalidation messages at end of 
subtransaction.
  *
@@ -1082,9 +1191,20 @@ void
 AtEOSubXact_Inval(bool isCommit)
 {
        int                     my_level;
-       TransInvalidationInfo *myInfo = transInvalInfo;
+       TransInvalidationInfo *myInfo;
 
-       /* Quick exit if no messages. */
+       /*
+        * Successful inplace update must clear this, but we clear it on abort.
+        * Inplace updates allocate this in CurrentMemoryContext, which has
+        * lifespan <= subtransaction lifespan.  Hence, don't free it 
explicitly.
+        */
+       if (isCommit)
+               Assert(inplaceInvalInfo == NULL);
+       else
+               inplaceInvalInfo = NULL;
+
+       /* Quick exit if no transactional messages. */
+       myInfo = transInvalInfo;
        if (myInfo == NULL)
                return;
 
@@ -1125,12 +1245,12 @@ AtEOSubXact_Inval(bool isCommit)
                                                                   
&myInfo->PriorCmdInvalidMsgs);
 
                /* Must readjust parent's CurrentCmdInvalidMsgs indexes now */
-               SetGroupToFollow(&myInfo->parent->CurrentCmdInvalidMsgs,
+               SetGroupToFollow(&myInfo->parent->ii.CurrentCmdInvalidMsgs,
                                                 
&myInfo->parent->PriorCmdInvalidMsgs);
 
                /* Pending relcache inval becomes parent's problem too */
-               if (myInfo->RelcacheInitFileInval)
-                       myInfo->parent->RelcacheInitFileInval = true;
+               if (myInfo->ii.RelcacheInitFileInval)
+                       myInfo->parent->ii.RelcacheInitFileInval = true;
 
                /* Pop the transaction state stack */
                transInvalInfo = myInfo->parent;
@@ -1177,7 +1297,7 @@ CommandEndInvalidationMessages(void)
        if (transInvalInfo == NULL)
                return;
 
-       ProcessInvalidationMessages(&transInvalInfo->CurrentCmdInvalidMsgs,
+       ProcessInvalidationMessages(&transInvalInfo->ii.CurrentCmdInvalidMsgs,
                                                                
LocalExecuteInvalidationMessage);
 
        /* WAL Log per-command invalidation messages for wal_level=logical */
@@ -1185,26 +1305,21 @@ CommandEndInvalidationMessages(void)
                LogLogicalInvalidations();
 
        AppendInvalidationMessages(&transInvalInfo->PriorCmdInvalidMsgs,
-                                                          
&transInvalInfo->CurrentCmdInvalidMsgs);
+                                                          
&transInvalInfo->ii.CurrentCmdInvalidMsgs);
 }
 
 
 /*
- * CacheInvalidateHeapTuple
- *             Register the given tuple for invalidation at end of command
- *             (ie, current command is creating or outdating this tuple).
- *             Also, detect whether a relcache invalidation is implied.
- *
- * For an insert or delete, tuple is the target tuple and newtuple is NULL.
- * For an update, we are called just once, with tuple being the old tuple
- * version and newtuple the new version.  This allows avoidance of duplicate
- * effort during an update.
+ * CacheInvalidateHeapTupleCommon
+ *             Common logic for end-of-command and inplace variants.
  */
-void
-CacheInvalidateHeapTuple(Relation relation,
-                                                HeapTuple tuple,
-                                                HeapTuple newtuple)
+static void
+CacheInvalidateHeapTupleCommon(Relation relation,
+                                                          HeapTuple tuple,
+                                                          HeapTuple newtuple,
+                                                          InvalidationInfo 
*(*prepare_callback) (void))
 {
+       InvalidationInfo *info;
        Oid                     tupleRelId;
        Oid                     databaseId;
        Oid                     relationId;
@@ -1228,11 +1343,8 @@ CacheInvalidateHeapTuple(Relation relation,
        if (IsToastRelation(relation))
                return;
 
-       /*
-        * If we're not prepared to queue invalidation messages for this
-        * subtransaction level, get ready now.
-        */
-       PrepareInvalidationState();
+       /* Allocate any required resources. */
+       info = prepare_callback();
 
        /*
         * First let the catcache do its thing
@@ -1241,11 +1353,12 @@ CacheInvalidateHeapTuple(Relation relation,
        if (RelationInvalidatesSnapshotsOnly(tupleRelId))
        {
                databaseId = IsSharedRelation(tupleRelId) ? InvalidOid : 
MyDatabaseId;
-               RegisterSnapshotInvalidation(databaseId, tupleRelId);
+               RegisterSnapshotInvalidation(info, databaseId, tupleRelId);
        }
        else
                PrepareToInvalidateCacheTuple(relation, tuple, newtuple,
-                                                                         
RegisterCatcacheInvalidation);
+                                                                         
RegisterCatcacheInvalidation,
+                                                                         (void 
*) info);
 
        /*
         * Now, is this tuple one of the primary definers of a relcache entry? 
See
@@ -1318,7 +1431,44 @@ CacheInvalidateHeapTuple(Relation relation,
        /*
         * Yes.  We need to register a relcache invalidation event.
         */
-       RegisterRelcacheInvalidation(databaseId, relationId);
+       RegisterRelcacheInvalidation(info, databaseId, relationId);
+}
+
+/*
+ * CacheInvalidateHeapTuple
+ *             Register the given tuple for invalidation at end of command
+ *             (ie, current command is creating or outdating this tuple) and 
end of
+ *             transaction.  Also, detect whether a relcache invalidation is 
implied.
+ *
+ * For an insert or delete, tuple is the target tuple and newtuple is NULL.
+ * For an update, we are called just once, with tuple being the old tuple
+ * version and newtuple the new version.  This allows avoidance of duplicate
+ * effort during an update.
+ */
+void
+CacheInvalidateHeapTuple(Relation relation,
+                                                HeapTuple tuple,
+                                                HeapTuple newtuple)
+{
+       CacheInvalidateHeapTupleCommon(relation, tuple, newtuple,
+                                                                  
PrepareInvalidationState);
+}
+
+/*
+ * CacheInvalidateHeapTupleInplace
+ *             Register the given tuple for nontransactional invalidation 
pertaining
+ *             to an inplace update.  Also, detect whether a relcache 
invalidation is
+ *             implied.
+ *
+ * Like CacheInvalidateHeapTuple(), but for inplace updates.
+ */
+void
+CacheInvalidateHeapTupleInplace(Relation relation,
+                                                               HeapTuple tuple,
+                                                               HeapTuple 
newtuple)
+{
+       CacheInvalidateHeapTupleCommon(relation, tuple, newtuple,
+                                                                  
PrepareInplaceInvalidationState);
 }
 
 /*
@@ -1337,14 +1487,13 @@ CacheInvalidateCatalog(Oid catalogId)
 {
        Oid                     databaseId;
 
-       PrepareInvalidationState();
-
        if (IsSharedRelation(catalogId))
                databaseId = InvalidOid;
        else
                databaseId = MyDatabaseId;
 
-       RegisterCatalogInvalidation(databaseId, catalogId);
+       RegisterCatalogInvalidation(PrepareInvalidationState(),
+                                                               databaseId, 
catalogId);
 }
 
 /*
@@ -1362,15 +1511,14 @@ CacheInvalidateRelcache(Relation relation)
        Oid                     databaseId;
        Oid                     relationId;
 
-       PrepareInvalidationState();
-
        relationId = RelationGetRelid(relation);
        if (relation->rd_rel->relisshared)
                databaseId = InvalidOid;
        else
                databaseId = MyDatabaseId;
 
-       RegisterRelcacheInvalidation(databaseId, relationId);
+       RegisterRelcacheInvalidation(PrepareInvalidationState(),
+                                                                databaseId, 
relationId);
 }
 
 /*
@@ -1383,9 +1531,8 @@ CacheInvalidateRelcache(Relation relation)
 void
 CacheInvalidateRelcacheAll(void)
 {
-       PrepareInvalidationState();
-
-       RegisterRelcacheInvalidation(InvalidOid, InvalidOid);
+       RegisterRelcacheInvalidation(PrepareInvalidationState(),
+                                                                InvalidOid, 
InvalidOid);
 }
 
 /*
@@ -1399,14 +1546,13 @@ CacheInvalidateRelcacheByTuple(HeapTuple classTuple)
        Oid                     databaseId;
        Oid                     relationId;
 
-       PrepareInvalidationState();
-
        relationId = classtup->oid;
        if (classtup->relisshared)
                databaseId = InvalidOid;
        else
                databaseId = MyDatabaseId;
-       RegisterRelcacheInvalidation(databaseId, relationId);
+       RegisterRelcacheInvalidation(PrepareInvalidationState(),
+                                                                databaseId, 
relationId);
 }
 
 /*
@@ -1420,8 +1566,6 @@ CacheInvalidateRelcacheByRelid(Oid relid)
 {
        HeapTuple       tup;
 
-       PrepareInvalidationState();
-
        tup = SearchSysCache1(RELOID, ObjectIdGetDatum(relid));
        if (!HeapTupleIsValid(tup))
                elog(ERROR, "cache lookup failed for relation %u", relid);
@@ -1611,7 +1755,7 @@ LogLogicalInvalidations(void)
        if (transInvalInfo == NULL)
                return;
 
-       group = &transInvalInfo->CurrentCmdInvalidMsgs;
+       group = &transInvalInfo->ii.CurrentCmdInvalidMsgs;
        nmsgs = NumMessagesInGroup(group);
 
        if (nmsgs > 0)
diff --git a/src/backend/utils/cache/syscache.c 
b/src/backend/utils/cache/syscache.c
index 50c9440..f41b1c2 100644
--- a/src/backend/utils/cache/syscache.c
+++ b/src/backend/utils/cache/syscache.c
@@ -351,8 +351,7 @@ SearchSysCacheLocked1(int cacheId,
 
                /*
                 * If an inplace update just finished, ensure we process the 
syscache
-                * inval.  XXX this is insufficient: the inplace updater may 
not yet
-                * have reached AtEOXact_Inval().  See test at 
inplace-inval.spec.
+                * inval.
                 *
                 * If a heap_update() call just released its LOCKTAG_TUPLE, 
we'll
                 * probably find the old tuple and reach "tuple concurrently 
updated".
diff --git a/src/include/utils/catcache.h b/src/include/utils/catcache.h
index 3fb9647..8f04bb8 100644
--- a/src/include/utils/catcache.h
+++ b/src/include/utils/catcache.h
@@ -225,6 +225,7 @@ extern void CatCacheInvalidate(CatCache *cache, uint32 
hashValue);
 extern void PrepareToInvalidateCacheTuple(Relation relation,
                                                                                
  HeapTuple tuple,
                                                                                
  HeapTuple newtuple,
-                                                                               
  void (*function) (int, uint32, Oid));
+                                                                               
  void (*function) (int, uint32, Oid, void *),
+                                                                               
  void *context);
 
 #endif                                                 /* CATCACHE_H */
diff --git a/src/include/utils/inval.h b/src/include/utils/inval.h
index 24695fa..299cd75 100644
--- a/src/include/utils/inval.h
+++ b/src/include/utils/inval.h
@@ -28,6 +28,10 @@ extern void AcceptInvalidationMessages(void);
 
 extern void AtEOXact_Inval(bool isCommit);
 
+extern void PreInplace_Inval(void);
+extern void AtInplace_Inval(void);
+extern void ForgetInplace_Inval(void);
+
 extern void AtEOSubXact_Inval(bool isCommit);
 
 extern void PostPrepare_Inval(void);
@@ -37,6 +41,9 @@ extern void CommandEndInvalidationMessages(void);
 extern void CacheInvalidateHeapTuple(Relation relation,
                                                                         
HeapTuple tuple,
                                                                         
HeapTuple newtuple);
+extern void CacheInvalidateHeapTupleInplace(Relation relation,
+                                                                               
        HeapTuple tuple,
+                                                                               
        HeapTuple newtuple);
 
 extern void CacheInvalidateCatalog(Oid catalogId);
 
diff --git a/src/test/isolation/expected/inplace-inval.out 
b/src/test/isolation/expected/inplace-inval.out
index e68eca5..c35895a 100644
--- a/src/test/isolation/expected/inplace-inval.out
+++ b/src/test/isolation/expected/inplace-inval.out
@@ -1,6 +1,6 @@
 Parsed test spec with 3 sessions
 
-starting permutation: cachefill3 cir1 cic2 ddl3
+starting permutation: cachefill3 cir1 cic2 ddl3 read1
 step cachefill3: TABLE newly_indexed;
 c
 -
@@ -9,6 +9,14 @@ c
 step cir1: BEGIN; CREATE INDEX i1 ON newly_indexed (c); ROLLBACK;
 step cic2: CREATE INDEX i2 ON newly_indexed (c);
 step ddl3: ALTER TABLE newly_indexed ADD extra int;
+step read1: 
+       SELECT relhasindex FROM pg_class WHERE oid = 'newly_indexed'::regclass;
+
+relhasindex
+-----------
+t          
+(1 row)
+
 
 starting permutation: cir1 cic2 ddl3 read1
 step cir1: BEGIN; CREATE INDEX i1 ON newly_indexed (c); ROLLBACK;
diff --git a/src/test/isolation/specs/inplace-inval.spec 
b/src/test/isolation/specs/inplace-inval.spec
index 96954fd..b99112d 100644
--- a/src/test/isolation/specs/inplace-inval.spec
+++ b/src/test/isolation/specs/inplace-inval.spec
@@ -1,7 +1,7 @@
-# If a heap_update() caller retrieves its oldtup from a cache, it's possible
-# for that cache entry to predate an inplace update, causing loss of that
-# inplace update.  This arises because the transaction may abort before
-# sending the inplace invalidation message to the shared queue.
+# An inplace update had been able to abort before sending the inplace
+# invalidation message to the shared queue.  If a heap_update() caller then
+# retrieved its oldtup from a cache, the heap_update() could revert the
+# inplace update.
 
 setup
 {
@@ -27,14 +27,12 @@ step cachefill3     { TABLE newly_indexed; }
 step ddl3              { ALTER TABLE newly_indexed ADD extra int; }
 
 
-# XXX shows an extant bug.  Adding step read1 at the end would usually print
-# relhasindex=f (not wanted).  This does not reach the unwanted behavior under
-# -DCATCACHE_FORCE_RELEASE and friends.
 permutation
        cachefill3      # populates the pg_class row in the catcache
        cir1    # sets relhasindex=true; rollback discards cache inval
        cic2    # sees relhasindex=true, skips changing it (so no inval)
        ddl3    # cached row as the oldtup of an update, losing relhasindex
+       read1   # observe damage
 
 # without cachefill3, no bug
 permutation cir1 cic2 ddl3 read1
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 5628b3f..3b57e78 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1253,6 +1253,7 @@ Interval
 IntervalAggState
 IntoClause
 InvalMessageArray
+InvalidationInfo
 InvalidationMsgsGroup
 IpcMemoryId
 IpcMemoryKey

From: Noah Misch <[email protected]>

Assert lack of hazardous buffer locks before possible catalog read.

Commit 0bada39c83a150079567a6e97b1a25a198f30ea3 fixed a bug of this kind,
which existed in all branches for six days before detection.  While the
probability of reaching the trouble was low, the disruption was extreme.  No
new backends could start, and service restoration needed an immediate
shutdown.  Hence, add this to catch the next bug like it.

The new check in RelationIdGetRelation() suffices to make autovacuum detect
the bug in commit 243e9b40f1b2dd09d6e5bf91ebf6e822a2cd3704 that led to commit
0bada39.  This also checks in a number of similar places.  It replaces each
Assert(IsTransactionState()) that pertained to a conditional catalog read.

This a back-patch of commit f4ece891fc2f3f96f0571720a1ae30db8030681b to
all supported branches, to accompany the back-patch of commits 243e9b4
and 0bada39.  For catalog indexes, the bttextcmp() behavior that
motivated IsCatalogTextUniqueIndexOid() was v18-specific, so this
back-patch doesn't need that or the correction of that from commit
4a4ee0c2c1e53401924101945ac3d517c0a8a559.

Reported-by: Alexander Lakhin <[email protected]>
Discussion: https://postgr.es/m/[email protected]
Discussion: https://postgr.es/m/[email protected]
Backpatch-through: 13-17

diff --git a/src/backend/storage/buffer/bufmgr.c 
b/src/backend/storage/buffer/bufmgr.c
index f8d30bf..7daf1be 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -40,6 +40,9 @@
 #include "access/tableam.h"
 #include "access/xloginsert.h"
 #include "access/xlogutils.h"
+#ifdef USE_ASSERT_CHECKING
+#include "catalog/pg_tablespace_d.h"
+#endif
 #include "catalog/storage.h"
 #include "catalog/storage_xlog.h"
 #include "executor/instrument.h"
@@ -535,6 +538,10 @@ static void RelationCopyStorageUsingBuffer(RelFileLocator 
srclocator,
                                                                                
   ForkNumber forkNum, bool permanent);
 static void AtProcExit_Buffers(int code, Datum arg);
 static void CheckForBufferLeaks(void);
+#ifdef USE_ASSERT_CHECKING
+static void AssertNotCatalogBufferLock(LWLock *lock, LWLockMode mode,
+                                                                          void 
*unused_context);
+#endif
 static int     rlocator_comparator(const void *p1, const void *p2);
 static inline int buffertag_comparator(const BufferTag *ba, const BufferTag 
*bb);
 static inline int ckpt_buforder_comparator(const CkptSortItem *a, const 
CkptSortItem *b);
@@ -3647,6 +3654,66 @@ CheckForBufferLeaks(void)
 #endif
 }
 
+#ifdef USE_ASSERT_CHECKING
+/*
+ * Check for exclusive-locked catalog buffers.  This is the core of
+ * AssertCouldGetRelation().
+ *
+ * A backend would self-deadlock on LWLocks if the catalog scan read the
+ * exclusive-locked buffer.  The main threat is exclusive-locked buffers of
+ * catalogs used in relcache, because a catcache search on any catalog may
+ * build that catalog's relcache entry.  We don't have an inventory of
+ * catalogs relcache uses, so just check buffers of most catalogs.
+ *
+ * It's better to minimize waits while holding an exclusive buffer lock, so it
+ * would be nice to broaden this check not to be catalog-specific.  However,
+ * bttextcmp() accesses pg_collation, and non-core opclasses might similarly
+ * read tables.  That is deadlock-free as long as there's no loop in the
+ * dependency graph: modifying table A may cause an opclass to read table B,
+ * but it must not cause a read of table A.
+ */
+void
+AssertBufferLocksPermitCatalogRead(void)
+{
+       ForEachLWLockHeldByMe(AssertNotCatalogBufferLock, NULL);
+}
+
+static void
+AssertNotCatalogBufferLock(LWLock *lock, LWLockMode mode,
+                                                  void *unused_context)
+{
+       BufferDesc *bufHdr;
+       BufferTag       tag;
+       Oid                     relid;
+
+       if (mode != LW_EXCLUSIVE)
+               return;
+
+       if (!((BufferDescPadded *) lock > BufferDescriptors &&
+                 (BufferDescPadded *) lock < BufferDescriptors + NBuffers))
+               return;                                 /* not a buffer lock */
+
+       bufHdr = (BufferDesc *)
+               ((char *) lock - offsetof(BufferDesc, content_lock));
+       tag = bufHdr->tag;
+
+       /*
+        * This relNumber==relid assumption holds until a catalog experiences
+        * VACUUM FULL or similar.  After a command like that, relNumber will be
+        * in the normal (non-catalog) range, and we lose the ability to detect
+        * hazardous access to that catalog.  Calling RelidByRelfilenumber() 
would
+        * close that gap, but RelidByRelfilenumber() might then deadlock with a
+        * held lock.
+        */
+       relid = tag.relNumber;
+
+       Assert(!IsCatalogRelationOid(relid));
+       /* Shared rels are always catalogs: detect even after VACUUM FULL. */
+       Assert(tag.spcOid != GLOBALTABLESPACE_OID);
+}
+#endif
+
+
 /*
  * Helper routine to issue warnings when a buffer is unexpectedly pinned
  */
diff --git a/src/backend/storage/lmgr/lwlock.c 
b/src/backend/storage/lmgr/lwlock.c
index 5df62fa..ea83110 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -1887,6 +1887,21 @@ LWLockReleaseAll(void)
 
 
 /*
+ * ForEachLWLockHeldByMe - run a callback for each held lock
+ *
+ * This is meant as debug support only.
+ */
+void
+ForEachLWLockHeldByMe(void (*callback) (LWLock *, LWLockMode, void *),
+                                         void *context)
+{
+       int                     i;
+
+       for (i = 0; i < num_held_lwlocks; i++)
+               callback(held_lwlocks[i].lock, held_lwlocks[i].mode, context);
+}
+
+/*
  * LWLockHeldByMe - test whether my process holds a lock in any mode
  *
  * This is meant as debug support only.
diff --git a/src/backend/utils/adt/pg_locale.c 
b/src/backend/utils/adt/pg_locale.c
index 4c85a01..d2b16d8 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -66,6 +66,7 @@
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
 #include "utils/pg_locale.h"
+#include "utils/relcache.h"
 #include "utils/syscache.h"
 
 #ifdef USE_ICU
@@ -1258,6 +1259,8 @@ lookup_collation_cache(Oid collation, bool set_flags)
        Assert(OidIsValid(collation));
        Assert(collation != DEFAULT_COLLATION_OID);
 
+       AssertCouldGetRelation();
+
        if (collation_cache == NULL)
        {
                /* First time through, initialize the hash table */
diff --git a/src/backend/utils/cache/catcache.c 
b/src/backend/utils/cache/catcache.c
index aab29f5..3d4228b 100644
--- a/src/backend/utils/cache/catcache.c
+++ b/src/backend/utils/cache/catcache.c
@@ -1055,11 +1055,40 @@ RehashCatCacheLists(CatCache *cp)
 }
 
 /*
+ *             ConditionalCatalogCacheInitializeCache
+ *
+ * Call CatalogCacheInitializeCache() if not yet done.
+ */
+pg_attribute_always_inline
+static void
+ConditionalCatalogCacheInitializeCache(CatCache *cache)
+{
+#ifdef USE_ASSERT_CHECKING
+       /*
+        * TypeCacheRelCallback() runs outside transactions and relies on 
TYPEOID
+        * for hashing.  This isn't ideal.  Since lookup_type_cache() both
+        * registers the callback and searches TYPEOID, reaching trouble likely
+        * requires OOM at an unlucky moment.
+        *
+        * InvalidateAttoptCacheCallback() runs outside transactions and 
likewise
+        * relies on ATTNUM.  InitPostgres() initializes ATTNUM, so it's 
reliable.
+        */
+       if (!(cache->id == TYPEOID || cache->id == ATTNUM) ||
+               IsTransactionState())
+               AssertCouldGetRelation();
+       else
+               Assert(cache->cc_tupdesc != NULL);
+#endif
+
+       if (unlikely(cache->cc_tupdesc == NULL))
+               CatalogCacheInitializeCache(cache);
+}
+
+/*
  *             CatalogCacheInitializeCache
  *
  * This function does final initialization of a catcache: obtain the tuple
- * descriptor and set up the hash and equality function links.  We assume
- * that the relcache entry can be opened at this point!
+ * descriptor and set up the hash and equality function links.
  */
 #ifdef CACHEDEBUG
 #define CatalogCacheInitializeCache_DEBUG1 \
@@ -1194,8 +1223,7 @@ CatalogCacheInitializeCache(CatCache *cache)
 void
 InitCatCachePhase2(CatCache *cache, bool touch_index)
 {
-       if (cache->cc_tupdesc == NULL)
-               CatalogCacheInitializeCache(cache);
+       ConditionalCatalogCacheInitializeCache(cache);
 
        if (touch_index &&
                cache->id != AMOID &&
@@ -1374,16 +1402,12 @@ SearchCatCacheInternal(CatCache *cache,
        dlist_head *bucket;
        CatCTup    *ct;
 
-       /* Make sure we're in an xact, even if this ends up being a cache hit */
-       Assert(IsTransactionState());
-
        Assert(cache->cc_nkeys == nkeys);
 
        /*
         * one-time startup overhead for each cache
         */
-       if (unlikely(cache->cc_tupdesc == NULL))
-               CatalogCacheInitializeCache(cache);
+       ConditionalCatalogCacheInitializeCache(cache);
 
 #ifdef CATCACHE_STATS
        cache->cc_searches++;
@@ -1669,8 +1693,7 @@ GetCatCacheHashValue(CatCache *cache,
        /*
         * one-time startup overhead for each cache
         */
-       if (cache->cc_tupdesc == NULL)
-               CatalogCacheInitializeCache(cache);
+       ConditionalCatalogCacheInitializeCache(cache);
 
        /*
         * calculate the hash value
@@ -1721,8 +1744,7 @@ SearchCatCacheList(CatCache *cache,
        /*
         * one-time startup overhead for each cache
         */
-       if (unlikely(cache->cc_tupdesc == NULL))
-               CatalogCacheInitializeCache(cache);
+       ConditionalCatalogCacheInitializeCache(cache);
 
        Assert(nkeys > 0 && nkeys < cache->cc_nkeys);
 
@@ -2392,8 +2414,7 @@ PrepareToInvalidateCacheTuple(Relation relation,
                        continue;
 
                /* Just in case cache hasn't finished initialization yet... */
-               if (ccp->cc_tupdesc == NULL)
-                       CatalogCacheInitializeCache(ccp);
+               ConditionalCatalogCacheInitializeCache(ccp);
 
                hashvalue = CatalogCacheComputeTupleHashValue(ccp, 
ccp->cc_nkeys, tuple);
                dbid = ccp->cc_relisshared ? (Oid) 0 : MyDatabaseId;
diff --git a/src/backend/utils/cache/inval.c b/src/backend/utils/cache/inval.c
index 21ac3c2..82297fa 100644
--- a/src/backend/utils/cache/inval.c
+++ b/src/backend/utils/cache/inval.c
@@ -631,7 +631,8 @@ PrepareInvalidationState(void)
 {
        TransInvalidationInfo *myInfo;
 
-       Assert(IsTransactionState());
+       /* PrepareToInvalidateCacheTuple() needs relcache */
+       AssertCouldGetRelation();
        /* Can't queue transactional message while collecting inplace messages. 
*/
        Assert(inplaceInvalInfo == NULL);
 
@@ -700,7 +701,7 @@ PrepareInplaceInvalidationState(void)
 {
        InvalidationInfo *myInfo;
 
-       Assert(IsTransactionState());
+       AssertCouldGetRelation();
        /* limit of one inplace update under assembly */
        Assert(inplaceInvalInfo == NULL);
 
@@ -863,6 +864,12 @@ InvalidateSystemCaches(void)
 void
 AcceptInvalidationMessages(void)
 {
+#ifdef USE_ASSERT_CHECKING
+       /* message handlers shall access catalogs only during transactions */
+       if (IsTransactionState())
+               AssertCouldGetRelation();
+#endif
+
        ReceiveSharedInvalidMessages(LocalExecuteInvalidationMessage,
                                                                 
InvalidateSystemCaches);
 
@@ -1327,6 +1334,9 @@ CacheInvalidateHeapTupleCommon(Relation relation,
        Oid                     databaseId;
        Oid                     relationId;
 
+       /* PrepareToInvalidateCacheTuple() needs relcache */
+       AssertCouldGetRelation();
+
        /* Do nothing during bootstrap */
        if (IsBootstrapProcessingMode())
                return;
diff --git a/src/backend/utils/cache/relcache.c 
b/src/backend/utils/cache/relcache.c
index 3f1e8ce..9e0519c 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -2037,6 +2037,23 @@ formrdesc(const char *relationName, Oid relationReltype,
        relation->rd_isvalid = true;
 }
 
+#ifdef USE_ASSERT_CHECKING
+/*
+ *             AssertCouldGetRelation
+ *
+ *             Check safety of calling RelationIdGetRelation().
+ *
+ *             In code that reads catalogs in the event of a cache miss, call 
this
+ *             before checking the cache.
+ */
+void
+AssertCouldGetRelation(void)
+{
+       Assert(IsTransactionState());
+       AssertBufferLocksPermitCatalogRead();
+}
+#endif
+
 
 /* ----------------------------------------------------------------
  *                              Relation Descriptor Lookup Interface
@@ -2064,8 +2081,7 @@ RelationIdGetRelation(Oid relationId)
 {
        Relation        rd;
 
-       /* Make sure we're in an xact, even if this ends up being a cache hit */
-       Assert(IsTransactionState());
+       AssertCouldGetRelation();
 
        /*
         * first try to find reldesc in the cache
diff --git a/src/backend/utils/mb/mbutils.c b/src/backend/utils/mb/mbutils.c
index 97a4d69..6af3b8e 100644
--- a/src/backend/utils/mb/mbutils.c
+++ b/src/backend/utils/mb/mbutils.c
@@ -39,6 +39,7 @@
 #include "mb/pg_wchar.h"
 #include "utils/fmgrprotos.h"
 #include "utils/memutils.h"
+#include "utils/relcache.h"
 #include "varatt.h"
 
 /*
@@ -310,7 +311,7 @@ InitializeClientEncoding(void)
        {
                Oid                     utf8_to_server_proc;
 
-               Assert(IsTransactionState());
+               AssertCouldGetRelation();
                utf8_to_server_proc =
                        FindDefaultConversionProc(PG_UTF8,
                                                                          
current_server_encoding);
diff --git a/src/include/storage/bufmgr.h b/src/include/storage/bufmgr.h
index a1e7101..09c43dd 100644
--- a/src/include/storage/bufmgr.h
+++ b/src/include/storage/bufmgr.h
@@ -255,6 +255,9 @@ extern Buffer ExtendBufferedRelTo(BufferManagerRelation bmr,
 
 extern void InitBufferPoolAccess(void);
 extern void AtEOXact_Buffers(bool isCommit);
+#ifdef USE_ASSERT_CHECKING
+extern void AssertBufferLocksPermitCatalogRead(void);
+#endif
 extern char *DebugPrintBufferRefcount(Buffer buffer);
 extern void CheckPointBuffers(int flags);
 extern BlockNumber BufferGetBlockNumber(Buffer buffer);
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index d70e6d3..5435948 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -129,6 +129,8 @@ extern bool LWLockAcquireOrWait(LWLock *lock, LWLockMode 
mode);
 extern void LWLockRelease(LWLock *lock);
 extern void LWLockReleaseClearVar(LWLock *lock, pg_atomic_uint64 *valptr, 
uint64 val);
 extern void LWLockReleaseAll(void);
+extern void ForEachLWLockHeldByMe(void (*callback) (LWLock *, LWLockMode, void 
*),
+                                                                 void 
*context);
 extern bool LWLockHeldByMe(LWLock *lock);
 extern bool LWLockAnyHeldByMe(LWLock *lock, int nlocks, size_t stride);
 extern bool LWLockHeldByMeInMode(LWLock *lock, LWLockMode mode);
diff --git a/src/include/utils/relcache.h b/src/include/utils/relcache.h
index 18c32ea..6cb829e 100644
--- a/src/include/utils/relcache.h
+++ b/src/include/utils/relcache.h
@@ -37,6 +37,14 @@ typedef Relation *RelationPtr;
 /*
  * Routines to open (lookup) and close a relcache entry
  */
+#ifdef USE_ASSERT_CHECKING
+extern void AssertCouldGetRelation(void);
+#else
+static inline void
+AssertCouldGetRelation(void)
+{
+}
+#endif
 extern Relation RelationIdGetRelation(Oid relationId);
 extern void RelationClose(Relation relation);

From: Noah Misch <[email protected]>



diff --git a/NON-MASTER-COMMITFEST-ENTRY b/NON-MASTER-COMMITFEST-ENTRY
new file mode 100644
index 0000000..b942c35
--- /dev/null
+++ b/NON-MASTER-COMMITFEST-ENTRY
@@ -0,0 +1,3 @@
+This commitfest entry proposes to change only non-master branches.  The cfbot
+is specific to the master branch.  Hence, this patch file exists to placate
+the cfbot.  See the other attachments for the real patches needing review.

Re: Inval reliability, especially for inplace updates

Reply via email to