Re: Inval reliability, especially for inplace updates

Noah Misch Thu, 31 Oct 2024 13:02:14 -0700

On Thu, Oct 31, 2024 at 05:00:02PM +0300, Alexander Lakhin wrote:
> I've accidentally discovered an incorrect behaviour caused by commit
> 4eac5a1fa. Running this script:


Thanks.  This looks important.

> parallel -j40 --linebuffer --tag .../reproi.sh ::: `seq 40`

This didn't reproduce it for me, at -j20, -j40, or -j80.  I tested at commit
fb7e27a.  At what commit(s) does it reproduce for you?  At what commits, if
any, did your test not reproduce this?

> All three autovacuum workers (1143263, 1143320, 1143403) are also waiting
> for the same buffer lock:
> #5  0x0000561dd715f1fe in PGSemaphoreLock (sema=0x7fed9a817338) at 
> pg_sema.c:327
> #6  0x0000561dd722fe02 in LWLockAcquire (lock=0x7fed9ad9b4e4, mode=LW_SHARED) 
> at lwlock.c:1318
> #7  0x0000561dd71f8423 in LockBuffer (buffer=36, mode=1) at bufmgr.c:4182

Can you share the full backtrace for the autovacuum workers?

This looks like four backends all waiting for BUFFER_LOCK_SHARE on the same
pg_class page.  One backend is in CREATE TABLE, and three are in autovacuum.
There are no other apparent processes that would hold the
BUFFER_LOCK_EXCLUSIVE blocking these four processes.

> Also as a side note, these processes can't be terminated with SIGTERM, I
> have to kill them.

That suggests they're trying to acquire one LWLock while holding another.
I'll recreate your CREATE TABLE stack trace and study its conditions.  It's
not readily clear to me how that would end up holding relevant lwlocks.

Guessing how this happened did lead me to a bad decision in commit a07e03f,
but I expect fixing that bad decision won't fix the hang you captured.  That
commit made index_update_stats() needlessly call RelationGetNumberOfBlocks()
and visibilitymap_count() with a pg_class heap buffer lock held.  Both do I/O,
and the latter can exclusive-lock a visibility map buffer.  The attached patch
corrects that.  Since the hang you captured involved a pg_class heap buffer
lock, I don't think this patch will fix that hang.  The other inplace updaters
are free from similar badness.

Author:     Noah Misch <n...@leadboat.com>
Commit:     Noah Misch <n...@leadboat.com>

    

diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 74d0f30..bef1af6 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -2806,6 +2806,9 @@ index_update_stats(Relation rel,
                                   bool hasindex,
                                   double reltuples)
 {
+       bool            update_stats;
+       BlockNumber relpages;
+       BlockNumber relallvisible;
        Oid                     relid = RelationGetRelid(rel);
        Relation        pg_class;
        ScanKeyData key[1];
@@ -2815,6 +2818,42 @@ index_update_stats(Relation rel,
        bool            dirty;
 
        /*
+        * As a special hack, if we are dealing with an empty table and the
+        * existing reltuples is -1, we leave that alone.  This ensures that
+        * creating an index as part of CREATE TABLE doesn't cause the table to
+        * prematurely look like it's been vacuumed.  The final rd_rel may be
+        * different from rel->rd_rel due to e.g. commit of concurrent GRANT, 
but
+        * the commands that change reltuples take locks conflicting with ours.
+        * (Even if a command changed reltuples under a weaker lock, this 
affects
+        * only statistics for an empty table.)
+        */
+       if (reltuples == 0 && rel->rd_rel->reltuples < 0)
+               reltuples = -1;
+
+       /*
+        * Don't update statistics during binary upgrade, because the indexes 
are
+        * created before the data is moved into place.
+        */
+       update_stats = reltuples >= 0 && !IsBinaryUpgrade;
+
+       /*
+        * Finish I/O and visibility map buffer locks before
+        * systable_inplace_update_begin() locks the pg_class buffer.  The final
+        * rd_rel may be different from rel->rd_rel due to e.g. commit of
+        * concurrent GRANT, but no command changes a relkind from non-index to
+        * index.  (Even if one did, relallvisible doesn't break functionality.)
+        */
+       if (update_stats)
+       {
+               relpages = RelationGetNumberOfBlocks(rel);
+
+               if (rel->rd_rel->relkind != RELKIND_INDEX)
+                       visibilitymap_count(rel, &relallvisible, NULL);
+               else                                    /* don't bother for 
indexes */
+                       relallvisible = 0;
+       }
+
+       /*
         * We always update the pg_class row using a non-transactional,
         * overwrite-in-place update.  There are several reasons for this:
         *
@@ -2858,15 +2897,6 @@ index_update_stats(Relation rel,
        /* Should this be a more comprehensive test? */
        Assert(rd_rel->relkind != RELKIND_PARTITIONED_INDEX);
 
-       /*
-        * As a special hack, if we are dealing with an empty table and the
-        * existing reltuples is -1, we leave that alone.  This ensures that
-        * creating an index as part of CREATE TABLE doesn't cause the table to
-        * prematurely look like it's been vacuumed.
-        */
-       if (reltuples == 0 && rd_rel->reltuples < 0)
-               reltuples = -1;
-
        /* Apply required updates, if any, to copied tuple */
 
        dirty = false;
@@ -2876,20 +2906,8 @@ index_update_stats(Relation rel,
                dirty = true;
        }
 
-       /*
-        * Avoid updating statistics during binary upgrade, because the indexes
-        * are created before the data is moved into place.
-        */
-       if (reltuples >= 0 && !IsBinaryUpgrade)
+       if (update_stats)
        {
-               BlockNumber relpages = RelationGetNumberOfBlocks(rel);
-               BlockNumber relallvisible;
-
-               if (rd_rel->relkind != RELKIND_INDEX)
-                       visibilitymap_count(rel, &relallvisible, NULL);
-               else                                    /* don't bother for 
indexes */
-                       relallvisible = 0;
-
                if (rd_rel->relpages != (int32) relpages)
                {
                        rd_rel->relpages = (int32) relpages;

Re: Inval reliability, especially for inplace updates

Reply via email to