On Sat, Oct 31, 2020 at 01:31:17AM -0500, Justin Pryzby wrote: > Forking this thread, since the existing CFs have been closed. > https://www.postgresql.org/message-id/flat/20200914143102.GX18552%40telsasoft.com#58b1056488451f8594b0f0ba40996afd > > The strategy is to create catalog entries for all tables with > indisvalid=false, > and then process them like REINDEX CONCURRENTLY. If it's interrupted, it > leaves INVALID indexes, which can be cleaned up with DROP or REINDEX, same as > CIC on a plain table. > > On Sat, Aug 08, 2020 at 01:37:44AM -0500, Justin Pryzby wrote: > > On Mon, Jun 15, 2020 at 09:37:42PM +0900, Michael Paquier wrote: > > > On Mon, Jun 15, 2020 at 08:15:05PM +0800, 李杰(慎追) wrote: > > > > As shown above, an error occurred while creating an index in the second > > > > partition. > > > > It can be clearly seen that the index of the partitioned table is > > > > invalid > > > > and the index of the first partition is normal, the second partition is > > > > invalid, > > > > and the Third Partition index does not exist at all. > > > > > > That's a problem. I really think that we should make the steps of the > > > concurrent operation consistent across all relations, meaning that all > > > the indexes should be created as invalid for all the parts of the > > > partition tree, including partitioned tables as well as their > > > partitions, in the same transaction. Then a second new transaction > > > gets used for the index build, followed by a third one for the > > > validation that switches the indexes to become valid. > > > > Note that the mentioned problem wasn't serious: there was missing index on > > child table, therefor the parent index was invalid, as intended. However I > > agree that it's not nice that the command can fail so easily and leave > > behind > > some indexes created successfully and some failed some not created at all. > > > > But I took your advice initially creating invalid inds. > ... > > That gave me the idea to layer CIC on top of Reindex, since I think it does > > exactly what's needed. > > On Sat, Sep 26, 2020 at 02:56:55PM -0500, Justin Pryzby wrote: > > On Thu, Sep 24, 2020 at 05:11:03PM +0900, Michael Paquier wrote: > > > It would be good also to check if > > > we have a partition index tree that maps partially with a partition > > > table tree (aka no all table partitions have a partition index), where > > > these don't get clustered because there is no index to work on. > > > > This should not happen, since a incomplete partitioned index is "invalid".
@cfbot: rebased over recent changes to indexcmds.c -- Justin
>From 05cd4feaa5a1d77a1ae8e3a167a68b2a21af1fc6 Mon Sep 17 00:00:00 2001 From: Justin Pryzby <pryz...@telsasoft.com> Date: Sat, 6 Jun 2020 17:42:23 -0500 Subject: [PATCH v11 1/9] Allow CREATE INDEX CONCURRENTLY on partitioned table Note, this effectively reverts 050098b14, so take care to not reintroduce the bug it fixed. XXX: does pgstat_progress_update_param() break other commands progress ? --- doc/src/sgml/ref/create_index.sgml | 9 -- src/backend/commands/indexcmds.c | 141 ++++++++++++++++++------- src/test/regress/expected/indexing.out | 60 ++++++++++- src/test/regress/sql/indexing.sql | 18 +++- 4 files changed, 172 insertions(+), 56 deletions(-) diff --git a/doc/src/sgml/ref/create_index.sgml b/doc/src/sgml/ref/create_index.sgml index 29dee5689e..bd4431a3ce 100644 --- a/doc/src/sgml/ref/create_index.sgml +++ b/doc/src/sgml/ref/create_index.sgml @@ -661,15 +661,6 @@ Indexes: cannot. </para> - <para> - Concurrent builds for indexes on partitioned tables are currently not - supported. However, you may concurrently build the index on each - partition individually and then finally create the partitioned index - non-concurrently in order to reduce the time where writes to the - partitioned table will be locked out. In this case, building the - partitioned index is a metadata only operation. - </para> - </refsect2> </refsect1> diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c index ca24620fd0..76219381c1 100644 --- a/src/backend/commands/indexcmds.c +++ b/src/backend/commands/indexcmds.c @@ -68,6 +68,7 @@ /* non-export function prototypes */ +static void reindex_invalid_child_indexes(Oid indexRelationId); static bool CompareOpclassOptions(Datum *opts1, Datum *opts2, int natts); static void CheckPredicate(Expr *predicate); static void ComputeIndexAttrs(IndexInfo *indexInfo, @@ -671,17 +672,6 @@ DefineIndex(Oid relationId, partitioned = rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE; if (partitioned) { - /* - * Note: we check 'stmt->concurrent' rather than 'concurrent', so that - * the error is thrown also for temporary tables. Seems better to be - * consistent, even though we could do it on temporary table because - * we're not actually doing it concurrently. - */ - if (stmt->concurrent) - ereport(ERROR, - (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), - errmsg("cannot create index on partitioned table \"%s\" concurrently", - RelationGetRelationName(rel)))); if (stmt->excludeOpNames) ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), @@ -1119,6 +1109,11 @@ DefineIndex(Oid relationId, if (pd->nparts != 0) flags |= INDEX_CREATE_INVALID; } + else if (concurrent && OidIsValid(parentIndexId)) + { + /* If concurrent, initially build index partitions as "invalid" */ + flags |= INDEX_CREATE_INVALID; + } if (stmt->deferrable) constr_flags |= INDEX_CONSTR_CREATE_DEFERRABLE; @@ -1171,6 +1166,14 @@ DefineIndex(Oid relationId, */ if (!stmt->relation || stmt->relation->inh) { + /* + * Need to close the relation before recursing into children, so + * copy needed data into a longlived context. + */ + + MemoryContext ind_context = AllocSetContextCreate(PortalContext, "CREATE INDEX", + ALLOCSET_DEFAULT_SIZES); + MemoryContext oldcontext = MemoryContextSwitchTo(ind_context); PartitionDesc partdesc = RelationGetPartitionDesc(rel); int nparts = partdesc->nparts; Oid *part_oids = palloc(sizeof(Oid) * nparts); @@ -1178,12 +1181,15 @@ DefineIndex(Oid relationId, TupleDesc parentDesc; Oid *opfamOids; + // If concurrent, maybe this should be done after excluding indexes which already exist ? pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_TOTAL, nparts); memcpy(part_oids, partdesc->oids, sizeof(Oid) * nparts); + parentDesc = CreateTupleDescCopy(RelationGetDescr(rel)); + table_close(rel, NoLock); + MemoryContextSwitchTo(oldcontext); - parentDesc = RelationGetDescr(rel); opfamOids = palloc(sizeof(Oid) * numberOfKeyAttributes); for (i = 0; i < numberOfKeyAttributes; i++) opfamOids[i] = get_opclass_family(classObjectId[i]); @@ -1226,10 +1232,12 @@ DefineIndex(Oid relationId, continue; } + oldcontext = MemoryContextSwitchTo(ind_context); childidxs = RelationGetIndexList(childrel); attmap = build_attrmap_by_name(RelationGetDescr(childrel), parentDesc); + MemoryContextSwitchTo(oldcontext); foreach(cell, childidxs) { @@ -1300,10 +1308,14 @@ DefineIndex(Oid relationId, */ if (!found) { - IndexStmt *childStmt = copyObject(stmt); + IndexStmt *childStmt; bool found_whole_row; ListCell *lc; + oldcontext = MemoryContextSwitchTo(ind_context); + childStmt = copyObject(stmt); + MemoryContextSwitchTo(oldcontext); + /* * We can't use the same index name for the child index, * so clear idxname to let the recursive invocation choose @@ -1355,10 +1367,18 @@ DefineIndex(Oid relationId, createdConstraintId, is_alter_table, check_rights, check_not_in_use, skip_build, quiet); + if (concurrent) + { + PopActiveSnapshot(); + PushActiveSnapshot(GetTransactionSnapshot()); + invalidate_parent = true; + } } - pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_DONE, - i + 1); + /* For concurrent build, this is a catalog-only stage */ + if (!concurrent) + pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_DONE, + i + 1); free_attrmap(attmap); } @@ -1368,51 +1388,42 @@ DefineIndex(Oid relationId, * invalid, this is incorrect, so update our row to invalid too. */ if (invalidate_parent) - { - Relation pg_index = table_open(IndexRelationId, RowExclusiveLock); - HeapTuple tup, - newtup; - - tup = SearchSysCache1(INDEXRELID, - ObjectIdGetDatum(indexRelationId)); - if (!HeapTupleIsValid(tup)) - elog(ERROR, "cache lookup failed for index %u", - indexRelationId); - newtup = heap_copytuple(tup); - ((Form_pg_index) GETSTRUCT(newtup))->indisvalid = false; - CatalogTupleUpdate(pg_index, &tup->t_self, newtup); - ReleaseSysCache(tup); - table_close(pg_index, RowExclusiveLock); - heap_freetuple(newtup); - } - } + index_set_state_flags(indexRelationId, INDEX_DROP_CLEAR_VALID); + } else + table_close(rel, NoLock); /* * Indexes on partitioned tables are not themselves built, so we're * done here. */ - table_close(rel, NoLock); if (!OidIsValid(parentIndexId)) + { + if (concurrent) + reindex_invalid_child_indexes(indexRelationId); + pgstat_progress_end_command(); + } + return address; } - if (!concurrent) + table_close(rel, NoLock); + if (!concurrent || OidIsValid(parentIndexId)) { - /* Close the heap and we're done, in the non-concurrent case */ - table_close(rel, NoLock); + /* + * We're done if this is the top-level index, + * or the catalog-only phase of a partition built concurrently + */ - /* If this is the top-level index, we're done. */ if (!OidIsValid(parentIndexId)) pgstat_progress_end_command(); return address; } - /* save lockrelid and locktag for below, then close rel */ + /* save lockrelid and locktag for below */ heaprelid = rel->rd_lockInfo.lockRelId; SET_LOCKTAG_RELATION(heaplocktag, heaprelid.dbId, heaprelid.relId); - table_close(rel, NoLock); /* * For a concurrent build, it's important to make the catalog entries @@ -1606,6 +1617,56 @@ DefineIndex(Oid relationId, return address; } +/* Reindex invalid child indexes created earlier */ +static void +reindex_invalid_child_indexes(Oid indexRelationId) +{ + ListCell *lc; + int npart = 0; + + MemoryContext ind_context = AllocSetContextCreate(PortalContext, "CREATE INDEX", + ALLOCSET_DEFAULT_SIZES); + MemoryContext oldcontext; + List *childs = find_inheritance_children(indexRelationId, ShareLock); + List *partitions = NIL; + + PreventInTransactionBlock(true, "REINDEX INDEX"); + + foreach (lc, childs) + { + Oid partoid = lfirst_oid(lc); + + pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_DONE, + npart++); + + if (get_index_isvalid(partoid) || + !RELKIND_HAS_STORAGE(get_rel_relkind(partoid))) + continue; + + /* Save partition OID */ + oldcontext = MemoryContextSwitchTo(ind_context); + partitions = lappend_oid(partitions, partoid); + MemoryContextSwitchTo(oldcontext); + } + + /* + * Process each partition listed in a separate transaction. Note that + * this commits and then starts a new transaction immediately. + */ + ReindexMultipleInternal(partitions, REINDEXOPT_CONCURRENTLY); + + /* + * CIC needs to mark a partitioned index as VALID, which itself + * requires setting READY, which is unset for CIC (even though + * it's meaningless for an index without storage). + * This must be done only while holding a lock which precludes adding + * partitions. + * See also: validatePartitionedIndex(). + */ + index_set_state_flags(indexRelationId, INDEX_CREATE_SET_READY); + CommandCounterIncrement(); + index_set_state_flags(indexRelationId, INDEX_CREATE_SET_VALID); +} /* * CheckMutability diff --git a/src/test/regress/expected/indexing.out b/src/test/regress/expected/indexing.out index c93f4470c9..f04abc6897 100644 --- a/src/test/regress/expected/indexing.out +++ b/src/test/regress/expected/indexing.out @@ -50,11 +50,63 @@ select relname, relkind, relhassubclass, inhparent::regclass (8 rows) drop table idxpart; --- Some unsupported features +-- CIC on partitioned table create table idxpart (a int, b int, c text) partition by range (a); -create table idxpart1 partition of idxpart for values from (0) to (10); -create index concurrently on idxpart (a); -ERROR: cannot create index on partitioned table "idxpart" concurrently +create table idxpart1 partition of idxpart for values from (0) to (10) partition by range(a); +create table idxpart11 partition of idxpart1 for values from (0) to (10) partition by range(a); +create table idxpart111 partition of idxpart11 default partition by range(a); +create table idxpart1111 partition of idxpart111 default partition by range(a); +create table idxpart2 partition of idxpart for values from (10) to (20); +insert into idxpart2 values(10),(10); -- not unique +create index concurrently on idxpart (a); -- partitioned +create index concurrently on idxpart1 (a); -- partitioned and partition +create index concurrently on idxpart11 (a); -- partitioned and partition, with no leaves +create index concurrently on idxpart2 (a); -- leaf +create unique index concurrently on idxpart (a); -- partitioned, unique failure +ERROR: could not create unique index "idxpart2_a_idx2_ccnew" +DETAIL: Key (a)=(10) is duplicated. +\d idxpart + Partitioned table "public.idxpart" + Column | Type | Collation | Nullable | Default +--------+---------+-----------+----------+--------- + a | integer | | | + b | integer | | | + c | text | | | +Partition key: RANGE (a) +Indexes: + "idxpart_a_idx" btree (a) + "idxpart_a_idx1" UNIQUE, btree (a) INVALID +Number of partitions: 2 (Use \d+ to list them.) + +\d idxpart1 + Partitioned table "public.idxpart1" + Column | Type | Collation | Nullable | Default +--------+---------+-----------+----------+--------- + a | integer | | | + b | integer | | | + c | text | | | +Partition of: idxpart FOR VALUES FROM (0) TO (10) +Partition key: RANGE (a) +Indexes: + "idxpart1_a_idx" btree (a) INVALID + "idxpart1_a_idx1" btree (a) + "idxpart1_a_idx2" UNIQUE, btree (a) INVALID +Number of partitions: 1 (Use \d+ to list them.) + +\d idxpart2 + Table "public.idxpart2" + Column | Type | Collation | Nullable | Default +--------+---------+-----------+----------+--------- + a | integer | | | + b | integer | | | + c | text | | | +Partition of: idxpart FOR VALUES FROM (10) TO (20) +Indexes: + "idxpart2_a_idx" btree (a) + "idxpart2_a_idx1" btree (a) + "idxpart2_a_idx2" UNIQUE, btree (a) INVALID + "idxpart2_a_idx2_ccnew" UNIQUE, btree (a) INVALID + drop table idxpart; -- Verify bugfix with query on indexed partitioned table with no partitions -- https://postgr.es/m/20180124162006.pmapfiznhgngwtjf@alvherre.pgsql diff --git a/src/test/regress/sql/indexing.sql b/src/test/regress/sql/indexing.sql index 42f398b67c..3d4b6e9bc9 100644 --- a/src/test/regress/sql/indexing.sql +++ b/src/test/regress/sql/indexing.sql @@ -29,10 +29,22 @@ select relname, relkind, relhassubclass, inhparent::regclass where relname like 'idxpart%' order by relname; drop table idxpart; --- Some unsupported features +-- CIC on partitioned table create table idxpart (a int, b int, c text) partition by range (a); -create table idxpart1 partition of idxpart for values from (0) to (10); -create index concurrently on idxpart (a); +create table idxpart1 partition of idxpart for values from (0) to (10) partition by range(a); +create table idxpart11 partition of idxpart1 for values from (0) to (10) partition by range(a); +create table idxpart111 partition of idxpart11 default partition by range(a); +create table idxpart1111 partition of idxpart111 default partition by range(a); +create table idxpart2 partition of idxpart for values from (10) to (20); +insert into idxpart2 values(10),(10); -- not unique +create index concurrently on idxpart (a); -- partitioned +create index concurrently on idxpart1 (a); -- partitioned and partition +create index concurrently on idxpart11 (a); -- partitioned and partition, with no leaves +create index concurrently on idxpart2 (a); -- leaf +create unique index concurrently on idxpart (a); -- partitioned, unique failure +\d idxpart +\d idxpart1 +\d idxpart2 drop table idxpart; -- Verify bugfix with query on indexed partitioned table with no partitions -- 2.17.0
>From ba40cea72e3f47e5e4d187316766083db5963fc8 Mon Sep 17 00:00:00 2001 From: Justin Pryzby <pryz...@telsasoft.com> Date: Fri, 30 Oct 2020 16:23:02 -0500 Subject: [PATCH v11 2/9] Add SKIPVALID flag for more integration --- src/backend/commands/indexcmds.c | 54 +++++++++++--------------------- src/include/nodes/parsenodes.h | 7 +++-- 2 files changed, 22 insertions(+), 39 deletions(-) diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c index 76219381c1..ffc166206b 100644 --- a/src/backend/commands/indexcmds.c +++ b/src/backend/commands/indexcmds.c @@ -1617,53 +1617,30 @@ DefineIndex(Oid relationId, return address; } -/* Reindex invalid child indexes created earlier */ +/* + * Reindex invalid child indexes created earlier thereby validating + * the parent index. + */ static void reindex_invalid_child_indexes(Oid indexRelationId) { - ListCell *lc; - int npart = 0; - - MemoryContext ind_context = AllocSetContextCreate(PortalContext, "CREATE INDEX", - ALLOCSET_DEFAULT_SIZES); - MemoryContext oldcontext; - List *childs = find_inheritance_children(indexRelationId, ShareLock); - List *partitions = NIL; - - PreventInTransactionBlock(true, "REINDEX INDEX"); - - foreach (lc, childs) - { - Oid partoid = lfirst_oid(lc); - - pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_DONE, - npart++); - - if (get_index_isvalid(partoid) || - !RELKIND_HAS_STORAGE(get_rel_relkind(partoid))) - continue; - - /* Save partition OID */ - oldcontext = MemoryContextSwitchTo(ind_context); - partitions = lappend_oid(partitions, partoid); - MemoryContextSwitchTo(oldcontext); - } - - /* - * Process each partition listed in a separate transaction. Note that - * this commits and then starts a new transaction immediately. - */ - ReindexMultipleInternal(partitions, REINDEXOPT_CONCURRENTLY); - /* * CIC needs to mark a partitioned index as VALID, which itself * requires setting READY, which is unset for CIC (even though * it's meaningless for an index without storage). * This must be done only while holding a lock which precludes adding * partitions. - * See also: validatePartitionedIndex(). */ + CommandCounterIncrement(); index_set_state_flags(indexRelationId, INDEX_CREATE_SET_READY); + + /* + * Process each partition listed in a separate transaction. Note that + * this commits and then starts a new transaction immediately. + */ + ReindexPartitions(indexRelationId, + REINDEXOPT_CONCURRENTLY | REINDEXOPT_SKIPVALID, true); + CommandCounterIncrement(); index_set_state_flags(indexRelationId, INDEX_CREATE_SET_VALID); } @@ -2947,6 +2924,11 @@ ReindexPartitions(Oid relid, int options, bool isTopLevel) if (!RELKIND_HAS_STORAGE(partkind)) continue; + /* Skip invalid indexes, if requested */ + if ((options & REINDEXOPT_SKIPVALID) != 0 && + get_index_isvalid(partoid)) + continue; + Assert(partkind == RELKIND_INDEX || partkind == RELKIND_RELATION); diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h index d1f9ef29ca..1373b57ae7 100644 --- a/src/include/nodes/parsenodes.h +++ b/src/include/nodes/parsenodes.h @@ -3347,10 +3347,11 @@ typedef struct ConstraintsSetStmt */ /* Reindex options */ -#define REINDEXOPT_VERBOSE (1 << 0) /* print progress info */ +#define REINDEXOPT_VERBOSE (1 << 0) /* print progress info */ #define REINDEXOPT_REPORT_PROGRESS (1 << 1) /* report pgstat progress */ -#define REINDEXOPT_MISSING_OK (1 << 2) /* skip missing relations */ -#define REINDEXOPT_CONCURRENTLY (1 << 3) /* concurrent mode */ +#define REINDEXOPT_MISSING_OK (1 << 2) /* skip missing relations */ +#define REINDEXOPT_CONCURRENTLY (1 << 3) /* concurrent mode */ +#define REINDEXOPT_SKIPVALID (1 << 4) /* skip valid indexes */ typedef enum ReindexObjectType { -- 2.17.0
>From b1ff160a715a7e87c93bf4f7179086e406f5dad1 Mon Sep 17 00:00:00 2001 From: Justin Pryzby <pryz...@telsasoft.com> Date: Fri, 30 Oct 2020 23:52:31 -0500 Subject: [PATCH v11 3/9] ReindexPartitions() to set indisvalid.. Something like this should probably have been included in a6642b3ae060976b42830b7dc8f29ec190ab05e4 See also 71a05b223, which mentioned the absence of any way to validate an index. --- src/backend/commands/indexcmds.c | 23 ++++++++++++++++++----- 1 file changed, 18 insertions(+), 5 deletions(-) diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c index ffc166206b..c6a75e887f 100644 --- a/src/backend/commands/indexcmds.c +++ b/src/backend/commands/indexcmds.c @@ -1628,8 +1628,6 @@ reindex_invalid_child_indexes(Oid indexRelationId) * CIC needs to mark a partitioned index as VALID, which itself * requires setting READY, which is unset for CIC (even though * it's meaningless for an index without storage). - * This must be done only while holding a lock which precludes adding - * partitions. */ CommandCounterIncrement(); index_set_state_flags(indexRelationId, INDEX_CREATE_SET_READY); @@ -1640,9 +1638,6 @@ reindex_invalid_child_indexes(Oid indexRelationId) */ ReindexPartitions(indexRelationId, REINDEXOPT_CONCURRENTLY | REINDEXOPT_SKIPVALID, true); - - CommandCounterIncrement(); - index_set_state_flags(indexRelationId, INDEX_CREATE_SET_VALID); } /* @@ -2944,6 +2939,24 @@ ReindexPartitions(Oid relid, int options, bool isTopLevel) */ ReindexMultipleInternal(partitions, options); + /* + * If indexes exist on all of the partitioned table's children, and we + * just reindexed them, then we know they're valid, and so can mark the + * parent index as valid. + * This handles the case of CREATE INDEX CONCURRENTLY. + * See also: validatePartitionedIndex(). + */ + if (get_rel_relkind(relid) == RELKIND_PARTITIONED_INDEX + && !get_index_isvalid(relid)) + { + Oid tableoid = IndexGetRelation(relid, false); + List *child_tables = find_all_inheritors(tableoid, ShareLock, NULL); + + /* Both lists include their parent relation as well as any intermediate partitioned rels */ + if (list_length(inhoids) == list_length(child_tables)) + index_set_state_flags(relid, INDEX_CREATE_SET_VALID); + } + /* * Clean up working storage --- note we must do this after * StartTransactionCommand, else we might be trying to delete the active -- 2.17.0