reindex partitioned indexes: refactor ReindexRelationConcurrently ?

Justin Pryzby Sun, 01 Nov 2020 23:00:33 -0800

This is a new thread about possible refactoring and commit a6642b3ae
("Add support for partitioned tables and indexes in REINDEX")


Currently, ReindexPartitions() calls ReindexMultipleInternal() which loops
around ReindexRelationConcurrently(), similar to what's done with
ReindexMultipleTables()

Contrast that with ReindexTable(), which calls ReindexRelationConcurrently()
with the table's OID, which then handles all indexes at once:

|postgres=# REINDEX TABLE CONCURRENTLY onek;
|DEBUG:  00000: building index "onek_unique1_ccnew" on table "onek" serially
|LOCATION:  index_build, index.c:2853
|DEBUG:  00000: index "onek_unique1_ccnew" can safely use deduplication
|LOCATION:  _bt_allequalimage, nbtutils.c:2742
|DEBUG:  00000: building index "onek_unique2_ccnew" on table "onek" serially
|LOCATION:  index_build, index.c:2853
|DEBUG:  00000: index "onek_unique2_ccnew" can safely use deduplication
|LOCATION:  _bt_allequalimage, nbtutils.c:2742
|DEBUG:  00000: building index "onek_hundred_ccnew" on table "onek" serially
|LOCATION:  index_build, index.c:2853
|DEBUG:  00000: index "onek_hundred_ccnew" can safely use deduplication
|LOCATION:  _bt_allequalimage, nbtutils.c:2742
|DEBUG:  00000: building index "onek_stringu1_ccnew" on table "onek" serially
|LOCATION:  index_build, index.c:2853
|DEBUG:  00000: index "onek_stringu1_ccnew" can safely use deduplication
|LOCATION:  _bt_allequalimage, nbtutils.c:2742
|DEBUG:  00000: validate_index found 1000 heap tuples, 1000 index tuples; 
inserted 0 missing tuples
|LOCATION:  validate_index, index.c:3281
|DEBUG:  00000: validate_index found 1000 heap tuples, 1000 index tuples; 
inserted 0 missing tuples
|LOCATION:  validate_index, index.c:3281
|DEBUG:  00000: validate_index found 1000 heap tuples, 1000 index tuples; 
inserted 0 missing tuples
|LOCATION:  validate_index, index.c:3281
|DEBUG:  00000: validate_index found 1000 heap tuples, 1000 index tuples; 
inserted 0 missing tuples
|LOCATION:  validate_index, index.c:3281
|REINDEX

Should we refactor ReindexRelationConcurrently() into two functions ?  One to
build a list of indexes on a relation, and one to concurrently reindex an
list of indexes.  Then, ReindexPartitions() would make a list of leaf indexes
of a partitioned index, or leaf indexes of partitioned indexes of a partitioned
table, and then reindex those indexes all at once.  For CIC, we could call
that from DefineIndex().

The reason is that concurrent Reindex must wait for longrunning transactions,
and if we call it in a loop, then we wait for longrunning transactions N times.
I can imagine scenarios where it's easy for an DBA to schedule maintenance to
do reindex concurrently and restart processes to allow the reindex to proceed.
But it might be infeasible to restart processes every 5min for 3 hours to allow
reindex to proceed on each partition.

ReindexMultipleTables avoids doing that to avoid deadlocks, which makes great
sense for REINDEX SCHEMA/DATABASE/SYSTEM.  But I wonder if that reasoning
doesn't apply to partitioned tables.

We currently have this:

ReindexIndex()
        => ReindexPartitions
        => ReindexRelationConcurrently
        => reindex_index
ReindexTable()
        => ReindexPartitions
        => ReindexRelationConcurrently
        => reindex_relation
ReindexPartitions()
        => ReindexMultipleInternal()
                => ReindexRelationConcurrently()
                => reindex_relation()
                => reindex_index()

And I'm proposing to consider this:

ReindexIndex()
        => ReindexPartitions
        => ReindexIndexesConcurrently
        => reindex_index
ReindexTable()
        => ReindexPartitions
        => ReindexRelationConcurrently - this would be just a wrapper to 
collect the list of index Oids
        => reindex_relation
ReindexPartitions - this exists mainly to make a list of all leaf indexes on 
all childs of a partitioned table
        => ReindexIndexesConcurrently - this processes all indexes at once
        => ReindexMultipleInternal - this loops around everything
                => ReindexRelationConcurrently()
                => reindex_index()
                => reindex_relation()
ReindexRelationConcurrently
        => ReindexIndexesConcurrently - this is the worker function factored 
out of ReindexRelationConcurrently

I'm not sure if I'm missing any opportunities to simplify...

So then it's processed similar to REINDEX TABLE (rather than REINDEX DATABASE).

|postgres=# REINDEX TABLE CONCURRENTLY hash_parted;
|DEBUG:  building index "hpart0_a_idx_ccnew" on table "hpart0" serially
|DEBUG:  index "hpart0_a_idx_ccnew" can safely use deduplication
|DEBUG:  building index "hpart1_a_idx_ccnew" on table "hpart1" serially
|DEBUG:  index "hpart1_a_idx_ccnew" can safely use deduplication
|DEBUG:  building index "hpart2_a_idx_ccnew" on table "hpart2" serially
|DEBUG:  index "hpart2_a_idx_ccnew" can safely use deduplication
|DEBUG:  building index "hpart3_a_idx_ccnew" on table "hpart3" serially
|DEBUG:  index "hpart3_a_idx_ccnew" can safely use deduplication
|DEBUG:  validate_index found 2489 heap tuples, 2489 index tuples; inserted 0 
missing tuples
|DEBUG:  validate_index found 2527 heap tuples, 2527 index tuples; inserted 0 
missing tuples
|DEBUG:  validate_index found 2530 heap tuples, 2530 index tuples; inserted 0 
missing tuples
|DEBUG:  validate_index found 2453 heap tuples, 2453 index tuples; inserted 0 
missing tuples
|REINDEX

And, if there are multiple indexes:

|postgres=# REINDEX TABLE CONCURRENTLY hash_parted;
|DEBUG:  building index "hpart0_a_idx_ccnew" on table "hpart0" serially
|DEBUG:  index "hpart0_a_idx_ccnew" can safely use deduplication
|DEBUG:  building index "hpart0_a_idx1_ccnew" on table "hpart0" serially
|DEBUG:  index "hpart0_a_idx1_ccnew" can safely use deduplication
|DEBUG:  building index "hpart1_a_idx_ccnew" on table "hpart1" serially
|DEBUG:  index "hpart1_a_idx_ccnew" can safely use deduplication
|DEBUG:  building index "hpart1_a_idx1_ccnew" on table "hpart1" serially
|DEBUG:  index "hpart1_a_idx1_ccnew" can safely use deduplication
|DEBUG:  building index "hpart2_a_idx_ccnew" on table "hpart2" serially
|DEBUG:  index "hpart2_a_idx_ccnew" can safely use deduplication
|DEBUG:  building index "hpart2_a_idx1_ccnew" on table "hpart2" serially
|DEBUG:  index "hpart2_a_idx1_ccnew" can safely use deduplication
|DEBUG:  building index "hpart3_a_idx_ccnew" on table "hpart3" serially
|DEBUG:  index "hpart3_a_idx_ccnew" can safely use deduplication
|DEBUG:  building index "hpart3_a_idx1_ccnew" on table "hpart3" serially
|DEBUG:  index "hpart3_a_idx1_ccnew" can safely use deduplication
|DEBUG:  validate_index found 2489 heap tuples, 2489 index tuples; inserted 0 
missing tuples
|DEBUG:  validate_index found 2489 heap tuples, 2489 index tuples; inserted 0 
missing tuples
|DEBUG:  validate_index found 2527 heap tuples, 2527 index tuples; inserted 0 
missing tuples
|DEBUG:  validate_index found 2527 heap tuples, 2527 index tuples; inserted 0 
missing tuples
|DEBUG:  validate_index found 2530 heap tuples, 2530 index tuples; inserted 0 
missing tuples
|DEBUG:  validate_index found 2530 heap tuples, 2530 index tuples; inserted 0 
missing tuples
|DEBUG:  validate_index found 2453 heap tuples, 2453 index tuples; inserted 0 
missing tuples
|DEBUG:  validate_index found 2453 heap tuples, 2453 index tuples; inserted 0 
missing tuples
|REINDEX

I think the usual scenario is to have 100-1000 partitions, and 1-10 indexes per
partition.  It seems to me that at least all partitions of a given index should
be processed simultaneously.

Also, it occured to me that CIC for regular tables could be simplied to do the
same thing that I'm doing for CIC on a partitioned table: create an INVALID
catalog entry, and then reindex it concurrently.  This seems to work, with the
only difference I see being that REINDEX leaves behind ccnew indexes.

Attached is what I'm thinking of.

>From 6d30c7bf0bf25618e159c937aeb5d3080d5ca825 Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryz...@telsasoft.com>
Date: Sun, 1 Nov 2020 15:46:03 -0600
Subject: [PATCH 1/9] Refactor CIC to rely on REINDEX CONCURRENTLY

---
 src/backend/commands/indexcmds.c           | 187 +--------------------
 src/test/regress/expected/create_index.out |  10 +-
 src/test/regress/sql/create_index.sql      |   4 +-
 3 files changed, 16 insertions(+), 185 deletions(-)

diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 75552c64ed..a3ba24947e 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -526,12 +526,8 @@ DefineIndex(Oid relationId,
 	bits16		constr_flags;
 	int			numberOfAttributes;
 	int			numberOfKeyAttributes;
-	TransactionId limitXmin;
 	ObjectAddress address;
-	LockRelId	heaprelid;
-	LOCKTAG		heaplocktag;
 	LOCKMODE	lockmode;
-	Snapshot	snapshot;
 	int			save_nestlevel = -1;
 	int			i;
 
@@ -1109,6 +1105,11 @@ DefineIndex(Oid relationId,
 		if (pd->nparts != 0)
 			flags |= INDEX_CREATE_INVALID;
 	}
+	else if (concurrent)
+	{
+		/* If concurrent, initially build indexes as "invalid" */
+		flags |= INDEX_CREATE_INVALID;
+	}
 
 	if (stmt->deferrable)
 		constr_flags |= INDEX_CONSTR_CREATE_DEFERRABLE;
@@ -1387,11 +1388,9 @@ DefineIndex(Oid relationId,
 		return address;
 	}
 
+	table_close(rel, NoLock);
 	if (!concurrent)
 	{
-		/* Close the heap and we're done, in the non-concurrent case */
-		table_close(rel, NoLock);
-
 		/* If this is the top-level index, we're done. */
 		if (!OidIsValid(parentIndexId))
 			pgstat_progress_end_command();
@@ -1399,185 +1398,13 @@ DefineIndex(Oid relationId,
 		return address;
 	}
 
-	/* save lockrelid and locktag for below, then close rel */
-	heaprelid = rel->rd_lockInfo.lockRelId;
-	SET_LOCKTAG_RELATION(heaplocktag, heaprelid.dbId, heaprelid.relId);
-	table_close(rel, NoLock);
-
-	/*
-	 * For a concurrent build, it's important to make the catalog entries
-	 * visible to other transactions before we start to build the index. That
-	 * will prevent them from making incompatible HOT updates.  The new index
-	 * will be marked not indisready and not indisvalid, so that no one else
-	 * tries to either insert into it or use it for queries.
-	 *
-	 * We must commit our current transaction so that the index becomes
-	 * visible; then start another.  Note that all the data structures we just
-	 * built are lost in the commit.  The only data we keep past here are the
-	 * relation IDs.
-	 *
-	 * Before committing, get a session-level lock on the table, to ensure
-	 * that neither it nor the index can be dropped before we finish. This
-	 * cannot block, even if someone else is waiting for access, because we
-	 * already have the same lock within our transaction.
-	 *
-	 * Note: we don't currently bother with a session lock on the index,
-	 * because there are no operations that could change its state while we
-	 * hold lock on the parent table.  This might need to change later.
-	 */
-	LockRelationIdForSession(&heaprelid, ShareUpdateExclusiveLock);
-
-	PopActiveSnapshot();
-	CommitTransactionCommand();
-	StartTransactionCommand();
-
 	/*
 	 * The index is now visible, so we can report the OID.
 	 */
 	pgstat_progress_update_param(PROGRESS_CREATEIDX_INDEX_OID,
 								 indexRelationId);
 
-	/*
-	 * Phase 2 of concurrent index build (see comments for validate_index()
-	 * for an overview of how this works)
-	 *
-	 * Now we must wait until no running transaction could have the table open
-	 * with the old list of indexes.  Use ShareLock to consider running
-	 * transactions that hold locks that permit writing to the table.  Note we
-	 * do not need to worry about xacts that open the table for writing after
-	 * this point; they will see the new index when they open it.
-	 *
-	 * Note: the reason we use actual lock acquisition here, rather than just
-	 * checking the ProcArray and sleeping, is that deadlock is possible if
-	 * one of the transactions in question is blocked trying to acquire an
-	 * exclusive lock on our table.  The lock code will detect deadlock and
-	 * error out properly.
-	 */
-	pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
-								 PROGRESS_CREATEIDX_PHASE_WAIT_1);
-	WaitForLockers(heaplocktag, ShareLock, true);
-
-	/*
-	 * At this moment we are sure that there are no transactions with the
-	 * table open for write that don't have this new index in their list of
-	 * indexes.  We have waited out all the existing transactions and any new
-	 * transaction will have the new index in its list, but the index is still
-	 * marked as "not-ready-for-inserts".  The index is consulted while
-	 * deciding HOT-safety though.  This arrangement ensures that no new HOT
-	 * chains can be created where the new tuple and the old tuple in the
-	 * chain have different index keys.
-	 *
-	 * We now take a new snapshot, and build the index using all tuples that
-	 * are visible in this snapshot.  We can be sure that any HOT updates to
-	 * these tuples will be compatible with the index, since any updates made
-	 * by transactions that didn't know about the index are now committed or
-	 * rolled back.  Thus, each visible tuple is either the end of its
-	 * HOT-chain or the extension of the chain is HOT-safe for this index.
-	 */
-
-	/* Set ActiveSnapshot since functions in the indexes may need it */
-	PushActiveSnapshot(GetTransactionSnapshot());
-
-	/* Perform concurrent build of index */
-	index_concurrently_build(relationId, indexRelationId);
-
-	/* we can do away with our snapshot */
-	PopActiveSnapshot();
-
-	/*
-	 * Commit this transaction to make the indisready update visible.
-	 */
-	CommitTransactionCommand();
-	StartTransactionCommand();
-
-	/*
-	 * Phase 3 of concurrent index build
-	 *
-	 * We once again wait until no transaction can have the table open with
-	 * the index marked as read-only for updates.
-	 */
-	pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
-								 PROGRESS_CREATEIDX_PHASE_WAIT_2);
-	WaitForLockers(heaplocktag, ShareLock, true);
-
-	/*
-	 * Now take the "reference snapshot" that will be used by validate_index()
-	 * to filter candidate tuples.  Beware!  There might still be snapshots in
-	 * use that treat some transaction as in-progress that our reference
-	 * snapshot treats as committed.  If such a recently-committed transaction
-	 * deleted tuples in the table, we will not include them in the index; yet
-	 * those transactions which see the deleting one as still-in-progress will
-	 * expect such tuples to be there once we mark the index as valid.
-	 *
-	 * We solve this by waiting for all endangered transactions to exit before
-	 * we mark the index as valid.
-	 *
-	 * We also set ActiveSnapshot to this snap, since functions in indexes may
-	 * need a snapshot.
-	 */
-	snapshot = RegisterSnapshot(GetTransactionSnapshot());
-	PushActiveSnapshot(snapshot);
-
-	/*
-	 * Scan the index and the heap, insert any missing index entries.
-	 */
-	validate_index(relationId, indexRelationId, snapshot);
-
-	/*
-	 * Drop the reference snapshot.  We must do this before waiting out other
-	 * snapshot holders, else we will deadlock against other processes also
-	 * doing CREATE INDEX CONCURRENTLY, which would see our snapshot as one
-	 * they must wait for.  But first, save the snapshot's xmin to use as
-	 * limitXmin for GetCurrentVirtualXIDs().
-	 */
-	limitXmin = snapshot->xmin;
-
-	PopActiveSnapshot();
-	UnregisterSnapshot(snapshot);
-
-	/*
-	 * The snapshot subsystem could still contain registered snapshots that
-	 * are holding back our process's advertised xmin; in particular, if
-	 * default_transaction_isolation = serializable, there is a transaction
-	 * snapshot that is still active.  The CatalogSnapshot is likewise a
-	 * hazard.  To ensure no deadlocks, we must commit and start yet another
-	 * transaction, and do our wait before any snapshot has been taken in it.
-	 */
-	CommitTransactionCommand();
-	StartTransactionCommand();
-
-	/* We should now definitely not be advertising any xmin. */
-	Assert(MyProc->xmin == InvalidTransactionId);
-
-	/*
-	 * The index is now valid in the sense that it contains all currently
-	 * interesting tuples.  But since it might not contain tuples deleted just
-	 * before the reference snap was taken, we have to wait out any
-	 * transactions that might have older snapshots.
-	 */
-	pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
-								 PROGRESS_CREATEIDX_PHASE_WAIT_3);
-	WaitForOlderSnapshots(limitXmin, true);
-
-	/*
-	 * Index can now be marked valid -- update its pg_index entry
-	 */
-	index_set_state_flags(indexRelationId, INDEX_CREATE_SET_VALID);
-
-	/*
-	 * The pg_index update will cause backends (including this one) to update
-	 * relcache entries for the index itself, but we should also send a
-	 * relcache inval on the parent table to force replanning of cached plans.
-	 * Otherwise existing sessions might fail to use the new index where it
-	 * would be useful.  (Note that our earlier commits did not create reasons
-	 * to replan; so relcache flush on the index itself was sufficient.)
-	 */
-	CacheInvalidateRelcacheByRelid(heaprelid.relId);
-
-	/*
-	 * Last thing to do is release the session-level lock on the parent table.
-	 */
-	UnlockRelationIdForSession(&heaprelid, ShareUpdateExclusiveLock);
+	ReindexRelationConcurrently(indexRelationId, REINDEXOPT_CONCURRENTLY);
 
 	pgstat_progress_end_command();
 
diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out
index 012c1eb067..47f2236853 100644
--- a/src/test/regress/expected/create_index.out
+++ b/src/test/regress/expected/create_index.out
@@ -1381,12 +1381,13 @@ ERROR:  duplicate key value violates unique constraint "concur_index2"
 DETAIL:  Key (f1)=(b) already exists.
 -- check if constraint is enforced properly at build time
 CREATE UNIQUE INDEX CONCURRENTLY concur_index3 ON concur_heap(f2);
-ERROR:  could not create unique index "concur_index3"
+ERROR:  could not create unique index "concur_index3_ccnew"
 DETAIL:  Key (f2)=(b) is duplicated.
 -- test that expression indexes and partial indexes work concurrently
 CREATE INDEX CONCURRENTLY concur_index4 on concur_heap(f2) WHERE f1='a';
 CREATE INDEX CONCURRENTLY concur_index5 on concur_heap(f2) WHERE f1='x';
 -- here we also check that you can default the index name
+DROP INDEX CONCURRENTLY IF EXISTS "concur_index3_ccnew";
 CREATE INDEX CONCURRENTLY on concur_heap((f2||f1));
 -- You can't do a concurrent index build in a transaction
 BEGIN;
@@ -1467,7 +1468,7 @@ DROP INDEX CONCURRENTLY "concur_index2";				-- works
 DROP INDEX CONCURRENTLY IF EXISTS "concur_index2";		-- notice
 NOTICE:  index "concur_index2" does not exist, skipping
 -- failures
-DROP INDEX CONCURRENTLY "concur_index2", "concur_index3";
+DROP INDEX CONCURRENTLY "concur_index2", "concur_index3_ccnew";
 ERROR:  DROP INDEX CONCURRENTLY does not support dropping multiple objects
 BEGIN;
 DROP INDEX CONCURRENTLY "concur_index5";
@@ -2495,13 +2496,14 @@ CREATE TABLE concur_reindex_tab4 (c1 int);
 INSERT INTO concur_reindex_tab4 VALUES (1), (1), (2);
 -- This trick creates an invalid index.
 CREATE UNIQUE INDEX CONCURRENTLY concur_reindex_ind5 ON concur_reindex_tab4 (c1);
-ERROR:  could not create unique index "concur_reindex_ind5"
+ERROR:  could not create unique index "concur_reindex_ind5_ccnew"
 DETAIL:  Key (c1)=(1) is duplicated.
 -- Reindexing concurrently this index fails with the same failure.
 -- The extra index created is itself invalid, and can be dropped.
 REINDEX INDEX CONCURRENTLY concur_reindex_ind5;
-ERROR:  could not create unique index "concur_reindex_ind5_ccnew"
+ERROR:  could not create unique index "concur_reindex_ind5_ccnew1"
 DETAIL:  Key (c1)=(1) is duplicated.
+DROP INDEX concur_reindex_ind5_ccnew1;
 \d concur_reindex_tab4
         Table "public.concur_reindex_tab4"
  Column |  Type   | Collation | Nullable | Default 
diff --git a/src/test/regress/sql/create_index.sql b/src/test/regress/sql/create_index.sql
index 4e45b18613..9ccb6201ab 100644
--- a/src/test/regress/sql/create_index.sql
+++ b/src/test/regress/sql/create_index.sql
@@ -480,6 +480,7 @@ CREATE UNIQUE INDEX CONCURRENTLY concur_index3 ON concur_heap(f2);
 CREATE INDEX CONCURRENTLY concur_index4 on concur_heap(f2) WHERE f1='a';
 CREATE INDEX CONCURRENTLY concur_index5 on concur_heap(f2) WHERE f1='x';
 -- here we also check that you can default the index name
+DROP INDEX CONCURRENTLY IF EXISTS "concur_index3_ccnew";
 CREATE INDEX CONCURRENTLY on concur_heap((f2||f1));
 
 -- You can't do a concurrent index build in a transaction
@@ -533,7 +534,7 @@ DROP INDEX CONCURRENTLY "concur_index2";				-- works
 DROP INDEX CONCURRENTLY IF EXISTS "concur_index2";		-- notice
 
 -- failures
-DROP INDEX CONCURRENTLY "concur_index2", "concur_index3";
+DROP INDEX CONCURRENTLY "concur_index2", "concur_index3_ccnew";
 BEGIN;
 DROP INDEX CONCURRENTLY "concur_index5";
 ROLLBACK;
@@ -1053,6 +1054,7 @@ CREATE UNIQUE INDEX CONCURRENTLY concur_reindex_ind5 ON concur_reindex_tab4 (c1)
 -- Reindexing concurrently this index fails with the same failure.
 -- The extra index created is itself invalid, and can be dropped.
 REINDEX INDEX CONCURRENTLY concur_reindex_ind5;
+DROP INDEX concur_reindex_ind5_ccnew1;
 \d concur_reindex_tab4
 DROP INDEX concur_reindex_ind5_ccnew;
 -- This makes the previous failure go away, so the index can become valid.
-- 
2.17.0

>From 2cb0d06c709c06f7876cde6a340fc1baf4664fc4 Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryz...@telsasoft.com>
Date: Sun, 1 Nov 2020 21:47:11 -0600
Subject: [PATCH 2/9] Refactor to allow reindexing all index partitions at once

---
 src/backend/commands/indexcmds.c | 369 +++++++++++++++++++------------
 1 file changed, 222 insertions(+), 147 deletions(-)

diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index a3ba24947e..423c5fd78a 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -88,6 +88,8 @@ static List *ChooseIndexColumnNames(List *indexElems);
 static void RangeVarCallbackForReindexIndex(const RangeVar *relation,
 											Oid relId, Oid oldRelId, void *arg);
 static bool ReindexRelationConcurrently(Oid relationOid, int options);
+static List *ReindexIndexesConcurrently(List *indexIds, int options,
+		MemoryContext private_context);
 
 static void ReindexPartitions(Oid relid, int options, bool isTopLevel);
 static void ReindexMultipleInternal(List *relids, int options);
@@ -1404,7 +1406,8 @@ DefineIndex(Oid relationId,
 	pgstat_progress_update_param(PROGRESS_CREATEIDX_INDEX_OID,
 								 indexRelationId);
 
-	ReindexRelationConcurrently(indexRelationId, REINDEXOPT_CONCURRENTLY);
+	ReindexIndexesConcurrently(list_make1_oid(indexRelationId),
+			REINDEXOPT_CONCURRENTLY, CurrentMemoryContext);
 
 	pgstat_progress_end_command();
 
@@ -2296,7 +2299,7 @@ ReindexIndex(RangeVar *indexRelation, int options, bool isTopLevel)
 		ReindexPartitions(indOid, options, isTopLevel);
 	else if ((options & REINDEXOPT_CONCURRENTLY) != 0 &&
 			 persistence != RELPERSISTENCE_TEMP)
-		ReindexRelationConcurrently(indOid, options);
+		ReindexIndexesConcurrently(list_make1_oid(indOid), options, CurrentMemoryContext);
 	else
 		reindex_index(indOid, false, persistence,
 					  options | REINDEXOPT_REPORT_PROGRESS);
@@ -2620,6 +2623,47 @@ reindex_error_callback(void *arg)
 				   errinfo->relnamespace, errinfo->relname);
 }
 
+
+/*
+ * Given a list of index oids, return a list of leaf partitions by removing
+ * any intermediate parents.  heaprels is populated with the corresponding
+ * tables.
+ */
+static List *
+leaf_partitions(List *inhoids, int options, List **heaprels)
+{
+	List		*partitions = NIL;
+	ListCell	*lc;
+
+	foreach(lc, inhoids)
+	{
+		Oid			partoid = lfirst_oid(lc);
+		Oid			tableoid;
+		Relation	table;
+		char		partkind = get_rel_relkind(partoid);
+
+		/*
+		 * This discards partitioned indexes and foreign tables.
+		 */
+		if (!RELKIND_HAS_STORAGE(partkind))
+			continue;
+
+		Assert(partkind == RELKIND_INDEX);
+
+		/* (try to) Open the table, with lock */
+		tableoid = IndexGetRelation(partoid, false);
+		table = table_open(tableoid, ShareLock);
+		table_close(table, NoLock);
+
+		/* Save partition OID in current MemoryContext */
+		partitions = lappend_oid(partitions, partoid);
+		*heaprels = lappend_oid(*heaprels, tableoid);
+	}
+
+	return partitions;
+}
+
+
 /*
  * ReindexPartitions
  *
@@ -2629,11 +2673,13 @@ reindex_error_callback(void *arg)
 static void
 ReindexPartitions(Oid relid, int options, bool isTopLevel)
 {
-	List	   *partitions = NIL;
+	List	   *partitions = NIL,
+			*heaprels = NIL;
 	char		relkind = get_rel_relkind(relid);
 	char	   *relname = get_rel_name(relid);
 	char	   *relnamespace = get_namespace_name(get_rel_namespace(relid));
 	MemoryContext reindex_context;
+	MemoryContext old_context;
 	List	   *inhoids;
 	ListCell   *lc;
 	ErrorContextCallback errcallback;
@@ -2678,33 +2724,42 @@ ReindexPartitions(Oid relid, int options, bool isTopLevel)
 	 * The list of relations to reindex are the physical partitions of the
 	 * tree so discard any partitioned table or index.
 	 */
-	foreach(lc, inhoids)
-	{
-		Oid			partoid = lfirst_oid(lc);
-		char		partkind = get_rel_relkind(partoid);
-		MemoryContext old_context;
 
-		/*
-		 * This discards partitioned tables, partitioned indexes and foreign
-		 * tables.
-		 */
-		if (!RELKIND_HAS_STORAGE(partkind))
-			continue;
-
-		Assert(partkind == RELKIND_INDEX ||
-			   partkind == RELKIND_RELATION);
-
-		/* Save partition OID */
+	if (relkind == RELKIND_PARTITIONED_INDEX)
+	{
 		old_context = MemoryContextSwitchTo(reindex_context);
-		partitions = lappend_oid(partitions, partoid);
+		partitions = leaf_partitions(inhoids, options, &heaprels);
 		MemoryContextSwitchTo(old_context);
+	} else {
+		/* Loop over parent tables */
+		foreach(lc, inhoids)
+		{
+			Oid		partoid = lfirst_oid(lc);
+			Relation parttable;
+			List	*partindexes;
+
+			parttable = table_open(partoid, ShareLock);
+			old_context = MemoryContextSwitchTo(reindex_context);
+			partindexes = RelationGetIndexList(parttable);
+			partindexes = leaf_partitions(partindexes, options, &heaprels);
+			partitions = list_concat(partitions, partindexes);
+			MemoryContextSwitchTo(old_context);
+			table_close(parttable, ShareLock);
+		}
 	}
 
-	/*
-	 * Process each partition listed in a separate transaction.  Note that
-	 * this commits and then starts a new transaction immediately.
-	 */
-	ReindexMultipleInternal(partitions, options);
+	if ((options & REINDEXOPT_CONCURRENTLY) != 0 &&
+		get_rel_persistence(relid) != RELPERSISTENCE_TEMP)
+	{
+		/* Process all indexes in a single loop */
+		ReindexIndexesConcurrently(partitions, options, reindex_context);
+	} else {
+		/*
+		 * Process each partition listed in a separate transaction.  Note that
+		 * this commits and then starts a new transaction immediately.
+		 */
+		ReindexMultipleInternal(partitions, options);
+	}
 
 	/*
 	 * Clean up working storage --- note we must do this after
@@ -2806,11 +2861,10 @@ ReindexMultipleInternal(List *relids, int options)
  * ReindexRelationConcurrently - process REINDEX CONCURRENTLY for given
  * relation OID
  *
- * 'relationOid' can either belong to an index, a table or a materialized
+ * 'relationOid' can either belong to a table or a materialized
  * view.  For tables and materialized views, all its indexes will be rebuilt,
  * excluding invalid indexes and any indexes used in exclusion constraints,
- * but including its associated toast table indexes.  For indexes, the index
- * itself will be rebuilt.
+ * but including its associated toast table indexes.
  *
  * The locks taken on parent tables and involved indexes are kept until the
  * transaction is committed, at which point a session lock is taken on each
@@ -2828,11 +2882,8 @@ ReindexMultipleInternal(List *relids, int options)
 static bool
 ReindexRelationConcurrently(Oid relationOid, int options)
 {
-	List	   *heapRelationIds = NIL;
 	List	   *indexIds = NIL;
 	List	   *newIndexIds = NIL;
-	List	   *relationLocks = NIL;
-	List	   *lockTags = NIL;
 	ListCell   *lc,
 			   *lc2;
 	MemoryContext private_context;
@@ -2841,13 +2892,6 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 	char	   *relationName = NULL;
 	char	   *relationNamespace = NULL;
 	PGRUsage	ru0;
-	const int	progress_index[] = {
-		PROGRESS_CREATEIDX_COMMAND,
-		PROGRESS_CREATEIDX_PHASE,
-		PROGRESS_CREATEIDX_INDEX_OID,
-		PROGRESS_CREATEIDX_ACCESS_METHOD_OID
-	};
-	int64		progress_vals[4];
 
 	/*
 	 * Create a memory context that will survive forced transaction commits we
@@ -2890,14 +2934,6 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 				 */
 				Relation	heapRelation;
 
-				/* Save the list of relation OIDs in private context */
-				oldcontext = MemoryContextSwitchTo(private_context);
-
-				/* Track this relation for session locks */
-				heapRelationIds = lappend_oid(heapRelationIds, relationOid);
-
-				MemoryContextSwitchTo(oldcontext);
-
 				if (IsCatalogRelationOid(relationOid))
 					ereport(ERROR,
 							(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -2955,14 +2991,6 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 					Relation	toastRelation = table_open(toastOid,
 														   ShareUpdateExclusiveLock);
 
-					/* Save the list of relation OIDs in private context */
-					oldcontext = MemoryContextSwitchTo(private_context);
-
-					/* Track this relation for session locks */
-					heapRelationIds = lappend_oid(heapRelationIds, toastOid);
-
-					MemoryContextSwitchTo(oldcontext);
-
 					foreach(lc2, RelationGetIndexList(toastRelation))
 					{
 						Oid			cellOid = lfirst_oid(lc2);
@@ -2997,66 +3025,8 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 				table_close(heapRelation, NoLock);
 				break;
 			}
-		case RELKIND_INDEX:
-			{
-				Oid			heapId = IndexGetRelation(relationOid,
-													  (options & REINDEXOPT_MISSING_OK) != 0);
-				Relation	heapRelation;
-
-				/* if relation is missing, leave */
-				if (!OidIsValid(heapId))
-					break;
-
-				if (IsCatalogRelationOid(heapId))
-					ereport(ERROR,
-							(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-							 errmsg("cannot reindex system catalogs concurrently")));
-
-				/*
-				 * Don't allow reindex for an invalid index on TOAST table, as
-				 * if rebuilt it would not be possible to drop it.
-				 */
-				if (IsToastNamespace(get_rel_namespace(relationOid)) &&
-					!get_index_isvalid(relationOid))
-					ereport(ERROR,
-							(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-							 errmsg("cannot reindex invalid index on TOAST table concurrently")));
-
-				/*
-				 * Check if parent relation can be locked and if it exists,
-				 * this needs to be done at this stage as the list of indexes
-				 * to rebuild is not complete yet, and REINDEXOPT_MISSING_OK
-				 * should not be used once all the session locks are taken.
-				 */
-				if ((options & REINDEXOPT_MISSING_OK) != 0)
-				{
-					heapRelation = try_table_open(heapId,
-												  ShareUpdateExclusiveLock);
-					/* leave if relation does not exist */
-					if (!heapRelation)
-						break;
-				}
-				else
-					heapRelation = table_open(heapId,
-											  ShareUpdateExclusiveLock);
-				table_close(heapRelation, NoLock);
-
-				/* Save the list of relation OIDs in private context */
-				oldcontext = MemoryContextSwitchTo(private_context);
-
-				/* Track the heap relation of this index for session locks */
-				heapRelationIds = list_make1_oid(heapId);
-
-				/*
-				 * Save the list of relation OIDs in private context.  Note
-				 * that invalid indexes are allowed here.
-				 */
-				indexIds = lappend_oid(indexIds, relationOid);
-
-				MemoryContextSwitchTo(oldcontext);
-				break;
-			}
 
+		case RELKIND_INDEX:
 		case RELKIND_PARTITIONED_TABLE:
 		case RELKIND_PARTITIONED_INDEX:
 		default:
@@ -3080,7 +3050,144 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 		return false;
 	}
 
-	Assert(heapRelationIds != NIL);
+	/* Do the work */
+	newIndexIds = ReindexIndexesConcurrently(indexIds, options, private_context);
+	if (newIndexIds == NIL)
+		return false;
+
+	/* Log what we did */
+	if (options & REINDEXOPT_VERBOSE)
+	{
+		if (relkind == RELKIND_INDEX)
+			ereport(INFO,
+					(errmsg("index \"%s.%s\" was reindexed",
+							relationNamespace, relationName),
+					 errdetail("%s.",
+							   pg_rusage_show(&ru0))));
+		else
+		{
+			foreach(lc, newIndexIds)
+			{
+				Oid			indOid = lfirst_oid(lc);
+
+				ereport(INFO,
+						(errmsg("index \"%s.%s\" was reindexed",
+								get_namespace_name(get_rel_namespace(indOid)),
+								get_rel_name(indOid))));
+				/* Don't show rusage here, since it's not per index. */
+			}
+
+			ereport(INFO,
+					(errmsg("table \"%s.%s\" was reindexed",
+							relationNamespace, relationName),
+					 errdetail("%s.",
+							   pg_rusage_show(&ru0))));
+		}
+	}
+
+	MemoryContextDelete(private_context);
+
+	return true;
+}
+
+/*
+ * Reindex concurrently for an list of index relations
+ * This is called by DefineIndex, ReindexPartitions, and
+ * ReindexRelationConcurrently.
+ * indexIds should be a list of "related" indexes, like all indexes on a
+ * relation, or all partitions of a partitioned index.  Expect deadlocks if
+ * passed unrelated indexes.
+ */
+static List *
+ReindexIndexesConcurrently(List *indexIds, int options,
+		MemoryContext private_context)
+{
+	List		*heapRelationIds = NIL;
+	List	   *newIndexIds = NIL;
+	List	   *relationLocks = NIL;
+	List	   *lockTags = NIL;
+
+	ListCell   *lc,
+			   *lc2;
+
+	MemoryContext oldcontext;
+
+	const int	progress_index[] = {
+		PROGRESS_CREATEIDX_COMMAND,
+		PROGRESS_CREATEIDX_PHASE,
+		PROGRESS_CREATEIDX_INDEX_OID,
+		PROGRESS_CREATEIDX_ACCESS_METHOD_OID
+	};
+	int64		progress_vals[4];
+
+	foreach(lc, indexIds)
+	{
+		Oid			indexrelid = lfirst_oid(lc);
+		Oid			heapId = IndexGetRelation(indexrelid,
+											  (options & REINDEXOPT_MISSING_OK) != 0);
+		Relation	heapRelation;
+
+		/* if relation is missing, leave */
+		if (!OidIsValid(heapId))
+		{
+			indexIds = list_delete_cell(indexIds, lc);
+			continue;
+			// This is the case of either: 1) a single index being reindexed,
+			// where its relation has disappeared; or, 2) a concurrent reindex
+			// of an partitioned index, for which one of the tables has been
+			// dropped.  XXX: can that happen with locks held ??
+		}
+
+		if (IsCatalogRelationOid(heapId))
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("cannot reindex system catalogs concurrently")));
+
+		/*
+		 * Don't allow reindex for an invalid index on TOAST table, as
+		 * if rebuilt it would not be possible to drop it.
+		 */
+		if (IsToastNamespace(get_rel_namespace(indexrelid)) &&
+			!get_index_isvalid(indexrelid))
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("cannot reindex invalid index on TOAST table concurrently")));
+
+		/*
+		 * Check if parent relation can be locked and if it exists,
+		 * this needs to be done at this stage as the list of indexes
+		 * to rebuild is not complete yet, and REINDEXOPT_MISSING_OK
+		 * should not be used once all the session locks are taken.
+		 */
+		if ((options & REINDEXOPT_MISSING_OK) != 0)
+		{
+			heapRelation = try_table_open(heapId,
+										  ShareUpdateExclusiveLock);
+			/* leave if relation does not exist */
+			if (!heapRelation)
+			{
+				indexIds = list_delete_cell(indexIds, lc);
+				continue;
+			}
+		}
+		else
+			heapRelation = table_open(heapId,
+									  ShareUpdateExclusiveLock);
+		table_close(heapRelation, NoLock);
+
+		/* Save the list of relation OIDs in private context */
+		oldcontext = MemoryContextSwitchTo(private_context);
+
+		/* Track the heap relation of this index for session locks */
+		heapRelationIds = lappend_oid(heapRelationIds, heapId);
+
+		/* Note that invalid indexes are allowed here. */
+
+		MemoryContextSwitchTo(oldcontext);
+	}
+
+	if (indexIds == NIL)
+		return NIL;
 
 	/*-----
 	 * Now we have all the indexes we want to process in indexIds.
@@ -3519,41 +3626,9 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 	/* Start a new transaction to finish process properly */
 	StartTransactionCommand();
 
-	/* Log what we did */
-	if (options & REINDEXOPT_VERBOSE)
-	{
-		if (relkind == RELKIND_INDEX)
-			ereport(INFO,
-					(errmsg("index \"%s.%s\" was reindexed",
-							relationNamespace, relationName),
-					 errdetail("%s.",
-							   pg_rusage_show(&ru0))));
-		else
-		{
-			foreach(lc, newIndexIds)
-			{
-				Oid			indOid = lfirst_oid(lc);
-
-				ereport(INFO,
-						(errmsg("index \"%s.%s\" was reindexed",
-								get_namespace_name(get_rel_namespace(indOid)),
-								get_rel_name(indOid))));
-				/* Don't show rusage here, since it's not per index. */
-			}
-
-			ereport(INFO,
-					(errmsg("table \"%s.%s\" was reindexed",
-							relationNamespace, relationName),
-					 errdetail("%s.",
-							   pg_rusage_show(&ru0))));
-		}
-	}
-
-	MemoryContextDelete(private_context);
-
 	pgstat_progress_end_command();
 
-	return true;
+	return newIndexIds;
 }
 
 /*
-- 
2.17.0

>From 8df67550ab48d6bf94b9be96f5f717718ff17d4d Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryz...@telsasoft.com>
Date: Fri, 30 Oct 2020 23:52:31 -0500
Subject: [PATCH 3/9] ReindexPartitions() to set indisvalid..

Something like this should probably have been included in
a6642b3ae060976b42830b7dc8f29ec190ab05e4

See also 71a05b223, which mentioned the absence of any way to validate an
index.
---
 src/backend/commands/indexcmds.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 423c5fd78a..3536eab569 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -2761,6 +2761,24 @@ ReindexPartitions(Oid relid, int options, bool isTopLevel)
 		ReindexMultipleInternal(partitions, options);
 	}
 
+	/*
+	 * If indexes exist on all of the partitioned table's children, and we
+	 * just reindexed them, then we know they're valid, and so can mark the
+	 * parent index as valid.
+	 * This handles the case of CREATE INDEX CONCURRENTLY.
+	 * See also: validatePartitionedIndex().
+	 */
+	if (get_rel_relkind(relid) == RELKIND_PARTITIONED_INDEX
+			&& !get_index_isvalid(relid))
+	{
+		Oid	tableoid = IndexGetRelation(relid, false);
+		List	*child_tables = find_all_inheritors(tableoid, ShareLock, NULL);
+
+		/* Both lists include their parent relation as well as any intermediate partitioned rels */
+		if (list_length(inhoids) == list_length(child_tables))
+			index_set_state_flags(relid, INDEX_CREATE_SET_VALID);
+	}
+
 	/*
 	 * Clean up working storage --- note we must do this after
 	 * StartTransactionCommand, else we might be trying to delete the active
-- 
2.17.0

reindex partitioned indexes: refactor ReindexRelationConcurrently ?

Reply via email to