Re: Inval reliability, especially for inplace updates

Noah Misch Tue, 18 Jun 2024 11:17:01 -0700

On Tue, Jun 18, 2024 at 08:23:49AM -0700, Noah Misch wrote:
> On Mon, Jun 17, 2024 at 06:57:30PM -0700, Andres Freund wrote:
> > On 2024-06-17 16:58:54 -0700, Noah Misch wrote:
> > > On Sat, Jun 15, 2024 at 03:37:18PM -0700, Noah Misch wrote:
> > > > On Wed, May 22, 2024 at 05:05:48PM -0700, Noah Misch wrote:
> > > > > https://postgr.es/m/20240512232923.aa.nmi...@google.com wrote:
> > > > > > Separable, nontrivial things not fixed in the attached patch stack:
> > > > > >
> > > > > > - Inplace update uses transactional CacheInvalidateHeapTuple().  
> > > > > > ROLLBACK of
> > > > > >   CREATE INDEX wrongly discards the inval, leading to the 
> > > > > > relhasindex=t loss
> > > > > >   still seen in inplace-inval.spec.  CacheInvalidateRelmap() does 
> > > > > > this right.
> > > > >
> > > > > I plan to fix that like CacheInvalidateRelmap(): send the inval 
> > > > > immediately,
> > > > > inside the critical section.  Send it in heap_xlog_inplace(), too.
> > 
> > I'm worried this might cause its own set of bugs, e.g. if there are any 
> > places
> > that, possibly accidentally, rely on the invalidation from the inplace 
> > update
> > to also cover separate changes.
> 
> Good point.  I do have index_update_stats() still doing an ideally-superfluous
> relcache update for that reason.  Taking that further, it would be cheap
> insurance to have the inplace update do a transactional inval in addition to
> its immediate inval.  Future master-only work could remove the transactional
> one.  How about that?
> 
> > Have you considered instead submitting these invalidations during abort as
> > well?
> 
> I had not.  Hmmm.  If the lock protocol in README.tuplock (after patch
> inplace120) told SearchSysCacheLocked1() to do systable scans instead of
> syscache reads, that could work.  Would need to ensure a PANIC if transaction
> abort doesn't reach the inval submission.  Overall, it would be harder to
> reason about the state of caches, but I suspect the patch would be smaller.
> How should we choose between those strategies?
> 
> > > > > a. Within logical decoding, cease processing invalidations for inplace
> > > >
> > > > I'm attaching the implementation.  This applies atop the v3 patch stack 
> > > > from
> > > > https://postgr.es/m/20240614003549.c2.nmi...@google.com, but the 
> > > > threads are
> > > > mostly orthogonal and intended for independent review.  Translating a 
> > > > tuple
> > > > into inval messages uses more infrastructure than relmapper, which 
> > > > needs just
> > > > a database ID.  Hence, this ended up more like a miniature of inval.c's
> > > > participation in the transaction commit sequence.
> > > >
> > > > I waffled on whether to back-patch inplace150-inval-durability-atcommit
> > >
> > > That inplace150 patch turned out to be unnecessary.  Contrary to the
> > > "noncritical resource releasing" comment some lines above
> > > AtEOXact_Inval(true), the actual behavior is already to promote ERROR to
> > > PANIC.  An ERROR just before or after sending invals becomes PANIC, 
> > > "cannot
> > > abort transaction %u, it was already committed".
> > 
> > Relying on that, instead of explicit critical sections, seems fragile to me.
> > IIRC some of the behaviour around errors around transaction commit/abort has
> > changed a bunch of times. Tying correctness into something that could be
> > changed for unrelated reasons doesn't seem great.
> 
> Fair enough.  It could still be a good idea for master, but given I missed a
> bug in inplace150-inval-durability-atcommit-v1.patch far worse than the ones
> $SUBJECT fixes, let's not risk it in back branches.
> 
> > I'm not sure it holds true even today - what if the transaction didn't have 
> > an
> > xid? Then RecordTransactionAbort() wouldn't trigger
> >      "cannot abort transaction %u, it was already committed"
> > I think?
> 
> I think that's right.  As the inplace160-inval-durability-inplace-v2.patch
> edits to xact.c say, the concept of invals in XID-less transactions is buggy
> at its core.  Fortunately, after that patch, we use them only for two things
> that could themselves stop with something roughly as simple as the attached.


Now actually attached.

> > > > - Same change, no WAL version bump.  Standby must update before 
> > > > primary.  This
> > > >   is best long-term, but the transition is more disruptive.  I'm leaning
> > > >   toward this one, but the second option isn't bad:
> > 
> > Hm. The inplace record doesn't use the length of the "main data" record
> > segment for anything, from what I can tell. If records by an updated primary
> > were replayed by an old standby, it'd just ignore the additional data, 
> > afaict?
> 
> Agreed, but ...
> 
> > I think with the code as-is, the situation with an updated standby replaying
> > an old primary's record would actually be worse - it'd afaict just assume 
> > the
> > now-longer record contained valid fields, despite those just pointing into
> > uninitialized memory.  I think the replay routine would have to check the
> > length of the main data and execute the invalidation conditionally.
> 
> I anticipated back branches supporting a new XLOG_HEAP_INPLACE_WITH_INVAL
> alongside the old XLOG_HEAP_INPLACE.  Updated standbys would run both fine,
> and old binaries consuming new WAL would PANIC, "heap_redo: unknown op code".
> 
> > > > - heap_xlog_inplace() could set the shared-inval-queue overflow signal 
> > > > on
> > > >   every backend.  This is more wasteful, but inplace updates might be 
> > > > rare
> > > >   enough (~once per VACUUM) to make it tolerable.
> > 
> > We already set that surprisingly frequently, as
> > a) The size of the sinval queue is small
> > b) If a backend is busy, it does not process catchup interrupts
> >    (i.e. executing queries, waiting for a lock prevents processing)
> > c) There's no deduplication of invals, we often end up sending the same 
> > inval
> >    over and over.
> > 
> > So I suspect this might not be too bad, compared to the current badness.
> 
> That is good.  We might be able to do the overflow signal once at end of
> recovery, like RelationCacheInitFileRemove() does for the init file.  That's
> mildly harder to reason about, but it would be cheaper.  Hmmm.
> 
> > At least for core code. I guess there could be extension code triggering
> > inplace updates more frequently? But I'd hope they'd do it not on catalog
> > tables... Except that we wouldn't know that that's the case during replay,
> > it's not contained in the record.
> 
> For what it's worth, from a grep of PGXN, only citus does inplace updates.
> 
> > > > - Use LogStandbyInvalidations() just after XLOG_HEAP_INPLACE.  This 
> > > > isn't
> > > >   correct if one ends recovery between the two records, but you'd need 
> > > > to be
> > > >   unlucky to notice.  Noticing would need a procedure like the 
> > > > following.  A
> > > >   hot standby backend populates a relcache entry, then does DDL on the 
> > > > rel
> > > >   after recovery ends.
> > 
> > Hm. The problematic cases presumably involves an access exclusive lock? If 
> > so,
> > could we do LogStandbyInvalidations() *before* logging the WAL record for 
> > the
> > inplace update? The invalidations can't be processed by other backends until
> > the exclusive lock has been released, which should avoid the race?
> 
> A lock forces a backend to drain the inval queue before using the locked
> object, but it doesn't stop the backend from draining the queue and
> repopulating cache entries earlier.  For example, pg_describe_object() can
> query many syscaches without locking underlying objects.  Hence, the inval
> system relies on the buffer change getting fully visible to catcache queries
> before the sinval message enters the shared queue.
> 
> Thanks,
> nm

diff --git a/src/backend/access/transam/xact.c 
b/src/backend/access/transam/xact.c
index 30285bd..0803e0a 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -1377,11 +1377,7 @@ RecordTransactionCommit(void)
                 * DELETE ROWS invals, but we've not done the work to withhold 
them.
                 */
                if (nmsgs != 0)
-               {
-                       LogStandbyInvalidations(nmsgs, invalMessages,
-                                                                       
RelcacheInitFileInval);
-                       wrote_xlog = true;      /* not strictly necessary */
-               }
+                       elog(ERROR, "cannot commit a transaction with 
invalidation messages but no xid");
 
                /*
                 * If we didn't create XLOG entries, we're done here; otherwise 
we
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 7fa80a5..3ef5b15 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -17439,6 +17439,10 @@ PreCommit_on_commit_actions(void)
        List       *oids_to_truncate = NIL;
        List       *oids_to_drop = NIL;
 
+       /* If no XID, we didn't create material to truncate or drop. */
+       if (!TransactionIdIsValid(GetCurrentTransactionIdIfAny()))
+               return;
+
        foreach(l, on_commits)
        {
                OnCommitItem *oc = (OnCommitItem *) lfirst(l);
diff --git a/src/backend/commands/typecmds.c b/src/backend/commands/typecmds.c
index 2a1e713..d87cc85 100644
--- a/src/backend/commands/typecmds.c
+++ b/src/backend/commands/typecmds.c
@@ -2878,7 +2878,8 @@ AlterDomainDropConstraint(List *names, const char 
*constrName,
         * dependent plans get rebuilt.  Since this command doesn't change the
         * domain's pg_type row, that won't happen automatically; do it 
manually.
         */
-       CacheInvalidateHeapTuple(rel, tup, NULL);
+       if (found)
+               CacheInvalidateHeapTuple(rel, tup, NULL);
 
        ObjectAddressSet(address, TypeRelationId, domainoid);

Re: Inval reliability, especially for inplace updates

Reply via email to