Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

Andres Freund Tue, 30 Apr 2019 10:35:39 -0700

On 2019-04-30 11:51:10 -0400, Tom Lane wrote:
> Andres Freund <and...@anarazel.de> writes:
> > On 2019-04-30 00:50:20 -0400, Tom Lane wrote:
> > I suspect the problem isn't REINDEX INDEX in general, it's REINDEX INDEX
> > over catalog tables modified during reindex.
>
> So far, every one of the failures in the buildfarm looks like the REINDEX
> is deciding that it needs to wait for some other transaction, eg here
>
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion&dt=2019-04-30%2014%3A43%3A11
>
> the relevant bit of postmaster log is
>
> 2019-04-30 14:44:13.478 UTC [16135:450] pg_regress/create_index LOG:  
> statement: REINDEX TABLE pg_class;
> 2019-04-30 14:44:14.478 UTC [16137:430] pg_regress/create_view LOG:  process 
> 16137 detected deadlock while waiting for AccessShareLock on relation 2662 of 
> database 16384 after 1000.148 ms
> 2019-04-30 14:44:14.478 UTC [16137:431] pg_regress/create_view DETAIL:  
> Process holding the lock: 16135. Wait queue: .
> 2019-04-30 14:44:14.478 UTC [16137:432] pg_regress/create_view STATEMENT:  
> DROP SCHEMA temp_view_test CASCADE;
> 2019-04-30 14:44:14.478 UTC [16137:433] pg_regress/create_view ERROR:  
> deadlock detected
> 2019-04-30 14:44:14.478 UTC [16137:434] pg_regress/create_view DETAIL:  
> Process 16137 waits for AccessShareLock on relation 2662 of database 16384; 
> blocked by process 16135.
>       Process 16135 waits for ShareLock on transaction 2875; blocked by 
> process 16137.
>       Process 16137: DROP SCHEMA temp_view_test CASCADE;
>       Process 16135: REINDEX TABLE pg_class;
> 2019-04-30 14:44:14.478 UTC [16137:435] pg_regress/create_view HINT:  See 
> server log for query details.
> 2019-04-30 14:44:14.478 UTC [16137:436] pg_regress/create_view STATEMENT:  
> DROP SCHEMA temp_view_test CASCADE;
>
> I haven't been able to reproduce this locally yet, but my guess is that
> the REINDEX wants to update some row that was already updated by the
> concurrent transaction, so it has to wait to see if the latter commits
> or not.  And, of course, waiting while holding AccessExclusiveLock on
> any index of pg_class is a Bad Idea (TM).  But I can't quite see why
> we'd be doing something like that during the reindex ...


I've reproduced something similar locally by running "REINDEX INDEX
pg_class_oid_index;" via pgbench. Fails over pretty much immediately.

It's the lock-upgrade problem I theorized about
upthread. ReindexIndex(), via RangeVarCallbackForReindexIndex(), takes a
ShareLock on pg_class, and then goes on to upgrade to RowExclusiveLock
in RelationSetNewRelfilenode(). But at that time another session
obviously can already have the ShareLock and would also want to upgrade.

The same problem exists with reindexing indexes on pg_index.

ReindexTable is also affected. It locks the table with ShareLock, but
then subsidiary routines upgrade to RowExclusiveLock. The way to fix it
would be a bit different than for ReindexIndex(), as the locking happens
via RangeVarGetRelidExtended() directly, rather than in the callback.

There's a somewhat related issue in the new REINDEX CONCURRENTLY. See
https://www.postgresql.org/message-id/20190430151735.wi52sxjvxsjvaxxt%40alap3.anarazel.de

Attached is a *hacky* prototype patch that fixes the issues for me. This
is *not* meant as an actual fix, just a demonstration.

Turns out it's not even sufficient to take a ShareRowExclusive for
pg_class. That prevents issues of concurrent REINDEX INDEX
pg_class_oid_index blowing up, but if one runs REINDEX INDEX
pg_class_oid_index; and REINDEX TABLE pg_class; (or just the latter)
concurrently it still blows up, albeit taking longer to do so.

The problem is that other codepaths expect to be able to hold an
AccessShareLock on pg_class, and multiple pg_class indexes
(e.g. catcache initialization which is easy to hit with -C, [1]). If we
were to want this concurrency safe, I think it requires an AEL on at
least pg_class for reindex (I think ShareRowExclusiveLock might suffice
for pg_index).

I'm not sure it's worth fixing this. It's crummy and somewhat fragile
that we'd have we'd have special locking rules for catalog tables. OTOH,
it really also sucks that a lone REINDEX TABLE pg_class; can deadlock
with another session doing nothing more than establishing a connection.

I guess it's not that common, and can be fixed by users by doing an
explicit BEGIN;LOCK pg_class;REINDEX TABLE pg_class;COMMIT;, but that's
not something anybody will know to do.

Pragmatically I don't think there's a meaningful difference between
holding a ShareLock on pg_class + AEL on one or more indexes, to holding
an AEL on pg_class. Just about every pg_class access is through an
index.

Greetings,

Andres Freund


[1]
#6  0x0000561dac7f9a36 in WaitOnLock (locallock=0x561dae101878, 
owner=0x561dae112ee8) at 
/home/andres/src/postgresql/src/backend/storage/lmgr/lock.c:1768
#7  0x0000561dac7f869e in LockAcquireExtended (locktag=0x7ffd7a128650, 
lockmode=1, sessionLock=false, dontWait=false, reportMemoryError=true,
    locallockp=0x7ffd7a128648) at 
/home/andres/src/postgresql/src/backend/storage/lmgr/lock.c:1050
#8  0x0000561dac7f5c15 in LockRelationOid (relid=2662, lockmode=1) at 
/home/andres/src/postgresql/src/backend/storage/lmgr/lmgr.c:116
#9  0x0000561dac3a3aa2 in relation_open (relationId=2662, lockmode=1) at 
/home/andres/src/postgresql/src/backend/access/common/relation.c:56
#10 0x0000561dac422560 in index_open (relationId=2662, lockmode=1) at 
/home/andres/src/postgresql/src/backend/access/index/indexam.c:156
#11 0x0000561dac421bbe in systable_beginscan (heapRelation=0x561dae14af80, 
indexId=2662, indexOK=true, snapshot=0x561dacd26f80 <CatalogSnapshotData>,
    nkeys=1, key=0x7ffd7a128760) at 
/home/andres/src/postgresql/src/backend/access/index/genam.c:364
#12 0x0000561dac982362 in ScanPgRelation (targetRelId=2663, indexOK=true, 
force_non_historic=false)
    at /home/andres/src/postgresql/src/backend/utils/cache/relcache.c:360
#13 0x0000561dac983b18 in RelationBuildDesc (targetRelId=2663, insertIt=true) 
at /home/andres/src/postgresql/src/backend/utils/cache/relcache.c:1058
#14 0x0000561dac985d24 in RelationIdGetRelation (relationId=2663) at 
/home/andres/src/postgresql/src/backend/utils/cache/relcache.c:2037
#15 0x0000561dac3a3aac in relation_open (relationId=2663, lockmode=1) at 
/home/andres/src/postgresql/src/backend/access/common/relation.c:59
#16 0x0000561dac422560 in index_open (relationId=2663, lockmode=1) at 
/home/andres/src/postgresql/src/backend/access/index/indexam.c:156
#17 0x0000561dac976116 in InitCatCachePhase2 (cache=0x561dae13e400, 
touch_index=true) at 
/home/andres/src/postgresql/src/backend/utils/cache/catcache.c:1050
--Type <RET> for more, q to quit, c to continue without paging--
#18 0x0000561dac990134 in InitCatalogCachePhase2 () at 
/home/andres/src/postgresql/src/backend/utils/cache/syscache.c:1078
#19 0x0000561dac988955 in RelationCacheInitializePhase3 () at 
/home/andres/src/postgresql/src/backend/utils/cache/relcache.c:3960
#20 0x0000561dac9acdac in InitPostgres (in_dbname=0x561dae111320 "postgres", 
dboid=0, username=0x561dae0dbaf8 "andres", useroid=0, out_dbname=0x0,
    override_allow_connections=false) at 
/home/andres/src/postgresql/src/backend/utils/init/postinit.c:1034

diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index cb5d91e7640..d6f34a09646 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -85,8 +85,16 @@ static char *ChooseIndexName(const char *tabname, Oid namespaceId,
 				bool primary, bool isconstraint);
 static char *ChooseIndexNameAddition(List *colnames);
 static List *ChooseIndexColumnNames(List *indexElems);
+
+typedef struct RVCReindexParams
+{
+	Oid heapOid;
+	bool concurrently;
+} RVCReindexParams;
 static void RangeVarCallbackForReindexIndex(const RangeVar *relation,
 								Oid relId, Oid oldRelId, void *arg);
+static void RangeVarCallbackForReindexTable(const RangeVar *relation,
+								Oid relId, Oid oldRelId, void *arg);
 static bool ReindexRelationConcurrently(Oid relationOid, int options);
 static void ReindexPartitionedIndex(Relation parentIdx);
 static void update_relispartition(Oid relationId, bool newval);
@@ -2304,10 +2312,13 @@ void
 ReindexIndex(RangeVar *indexRelation, int options, bool concurrent)
 {
 	Oid			indOid;
-	Oid			heapOid = InvalidOid;
+	RVCReindexParams params;
 	Relation	irel;
 	char		persistence;
 
+	params.heapOid = InvalidOid;
+	params.concurrently = concurrent;
+
 	/*
 	 * Find and lock index, and check permissions on table; use callback to
 	 * obtain lock on table first, to avoid deadlock hazard.  The lock level
@@ -2317,7 +2328,7 @@ ReindexIndex(RangeVar *indexRelation, int options, bool concurrent)
 									  concurrent ? ShareUpdateExclusiveLock : AccessExclusiveLock,
 									  0,
 									  RangeVarCallbackForReindexIndex,
-									  (void *) &heapOid);
+									  (void *) &params);
 
 	/*
 	 * Obtain the current persistence of the existing index.  We already hold
@@ -2340,6 +2351,48 @@ ReindexIndex(RangeVar *indexRelation, int options, bool concurrent)
 		reindex_index(indOid, false, persistence, options);
 }
 
+static LOCKMODE
+table_lockmode_for_reindex(Oid heapOid, bool concurrently)
+{
+	if (concurrently)
+	{
+		/*
+		 * If concurrently were allowed on catalog tables this result would be
+		 * wrong, but it is not. We can't assert or throw an error here
+		 * however, as it's not yet guaranteed that this is the affected
+		 * table.
+		 */
+		return ShareUpdateExclusiveLock;
+	}
+	else if (heapOid == IndexRelationId ||
+		heapOid == RelationRelationId)
+	{
+		/*
+		 * When reindexing indexes for pg_class or pg_index it's not
+		 * sufficient to take a ShareLock on the respective table. While
+		 * reindexing those tables need to be modified, which requires a
+		 * RowExclusiveLock - which presents a form of an upgrade hazard (it
+		 * e.g. conflicts with ShareLock held by a concurrent REINDEX, but
+		 * other sessions could acquire a ShareLock themselves as ShareLock
+		 * doesn't self-conflict).  Thus, only when updating those system
+		 * tables, we have to take a lock that guarantees both a Share and
+		 * RowExclusive can be acquired.
+		 *
+		 * One might think that ShareRowExclusive lock is sufficient to
+		 * prevent such deadlock scenarios, but it's not. At least for
+		 * pg_class there's plenty other codepaths that, reasonably, expect to
+		 * be able to have an AccessShareLock on the table, and acquire
+		 * AccessShareLocks on its indexes.  That then otherwise could
+		 * deadlock with e.g. reindex_relation, which one-by-one locks indexes
+		 * with AccessExclusiveLock.  Therefore we need an
+		 * AccessExclusiveLock.
+		 */
+		return AccessExclusiveLock;
+	}
+	else
+		return ShareLock;
+}
+
 /*
  * Check permissions on table before acquiring relation lock; also lock
  * the heap before the RangeVarGetRelidExtended takes the index lock, to avoid
@@ -2350,7 +2403,7 @@ RangeVarCallbackForReindexIndex(const RangeVar *relation,
 								Oid relId, Oid oldRelId, void *arg)
 {
 	char		relkind;
-	Oid		   *heapOid = (Oid *) arg;
+	RVCReindexParams *params = (RVCReindexParams *) arg;
 
 	/*
 	 * If we previously locked some other index's heap, and the name we're
@@ -2360,8 +2413,10 @@ RangeVarCallbackForReindexIndex(const RangeVar *relation,
 	if (relId != oldRelId && OidIsValid(oldRelId))
 	{
 		/* lock level here should match reindex_index() heap lock */
-		UnlockRelationOid(*heapOid, ShareLock);
-		*heapOid = InvalidOid;
+		UnlockRelationOid(params->heapOid,
+						  table_lockmode_for_reindex(params->heapOid,
+													 params->concurrently));
+		params->heapOid = InvalidOid;
 	}
 
 	/* If the relation does not exist, there's nothing more to do. */
@@ -2394,9 +2449,54 @@ RangeVarCallbackForReindexIndex(const RangeVar *relation,
 		 * isn't valid, it means the index as concurrently dropped, which is
 		 * not a problem for us; just return normally.
 		 */
-		*heapOid = IndexGetRelation(relId, true);
-		if (OidIsValid(*heapOid))
-			LockRelationOid(*heapOid, ShareLock);
+		params->heapOid = IndexGetRelation(relId, true);
+		if (OidIsValid(params->heapOid))
+			LockRelationOid(params->heapOid,
+							table_lockmode_for_reindex(params->heapOid, params->concurrently));
+	}
+}
+
+static void
+RangeVarCallbackForReindexTable(const RangeVar *relation,
+								Oid relId, Oid oldRelId, void *arg)
+{
+	char		relkind;
+	RVCReindexParams *params = (RVCReindexParams *) arg;
+
+	if (relId != oldRelId && OidIsValid(oldRelId))
+	{
+		/* lock level here should match reindex_index() heap lock */
+		UnlockRelationOid(oldRelId,
+						  table_lockmode_for_reindex(oldRelId,
+													 params->concurrently));
+	}
+
+	/* If the relation does not exist, there's nothing more to do. */
+	if (!OidIsValid(relId))
+		return;
+
+	/*
+	 * If the relation does exist, check whether it's an index.  But note that
+	 * the relation might have been dropped between the time we did the name
+	 * lookup and now.  In that case, there's nothing to do.
+	 */
+	relkind = get_rel_relkind(relId);
+	if (!relkind)
+		return;
+	if (relkind != RELKIND_RELATION && relkind != RELKIND_TOASTVALUE &&
+		relkind != RELKIND_MATVIEW && relkind != RELKIND_PARTITIONED_TABLE)
+		ereport(ERROR,
+				(errcode(ERRCODE_WRONG_OBJECT_TYPE),
+				 errmsg("\"%s\" is not a table or materialized view", relation->relname)));
+
+	/* Check permissions */
+	if (!pg_class_ownercheck(relId, GetUserId()))
+		aclcheck_error(ACLCHECK_NOT_OWNER, get_relkind_objtype(get_rel_relkind(relId)), relation->relname);
+
+	if (relId != oldRelId)
+	{
+		LockRelationOid(relId,
+						table_lockmode_for_reindex(relId, params->concurrently));
 	}
 }
 
@@ -2408,13 +2508,24 @@ Oid
 ReindexTable(RangeVar *relation, int options, bool concurrent)
 {
 	Oid			heapOid;
+	RVCReindexParams params;
 	bool		result;
 
-	/* The lock level used here should match reindex_relation(). */
+	params.heapOid = InvalidOid;
+	params.concurrently = concurrent;
+
+	/*
+	 * While we have RangeVarGetRelidExtended only acquire a AccessShareLock,
+	 * RangeVarCallbackForReindexTable acquires the appropriate type of lock
+	 * (which depends on whether the target is a catalog table). Have to
+	 * acquire some lock via RangeVarGetRelidExtended, as it otherwise won't
+	 * do all the necessary concurrency-safe trickery (indicate via flag
+	 * instead?).
+	 */
 	heapOid = RangeVarGetRelidExtended(relation,
-									   concurrent ? ShareUpdateExclusiveLock : ShareLock,
+									   AccessShareLock,
 									   0,
-									   RangeVarCallbackOwnsTable, NULL);
+									   RangeVarCallbackForReindexTable, &params);
 
 	if (concurrent)
 		result = ReindexRelationConcurrently(heapOid, options);

Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

Reply via email to