On Sun, Aug 09, 2020 at 06:44:23PM -0500, Justin Pryzby wrote: > Thanks for looking.
The REINDEX patch is progressing its way, so I have looked a bit at the part for CIC. Visibly, the case of multiple partition layers is not handled correctly. Here is a sequence that gets broken: CREATE TABLE parent_tab (id int) PARTITION BY RANGE (id); CREATE TABLE child_0_10 PARTITION OF parent_tab FOR VALUES FROM (0) TO (10); CREATE TABLE child_10_20 PARTITION OF parent_tab FOR VALUES FROM (10) TO (20); CREATE TABLE child_20_30 PARTITION OF parent_tab FOR VALUES FROM (20) TO (30); INSERT INTO parent_tab VALUES (generate_series(0,29)); CREATE TABLE child_30_40 PARTITION OF parent_tab FOR VALUES FROM (30) TO (40) PARTITION BY RANGE(id); CREATE TABLE child_30_35 PARTITION OF child_30_40 FOR VALUES FROM (30) TO (35); CREATE TABLE child_35_40 PARTITION OF child_30_40 FOR VALUES FROM (35) TO (40); INSERT INTO parent_tab VALUES (generate_series(30,39)); CREATE INDEX CONCURRENTLY parent_index ON parent_tab (id); This fails as follows: ERROR: XX000: unrecognized node type: 2139062143 LOCATION: copyObjectImpl, copyfuncs.c:5718 And the problem visibly comes from some handling with the second level of partitions, child_30_40 in my example above. Even with that, this outlines a rather severe problem in the patch: index_set_state_flags() flips indisvalid on/off using a non-transactional update (see the use of heap_inplace_update), meaning that if we fail in the middle of the operation we may finish with a partition index tree where some of the indexes are valid and some of them are invalid. In order to make this logic consistent, I am afraid that we will need to do two things: - Change index_set_state_flags() so as it uses a transactional update. That's something I would like to change for other reasons, like making sure that the REPLICA IDENTITY of a parent relation is correctly reset when dropping a replica index. - Make all the indexes of the partition tree valid in the *same* sub-transaction. You can note that this case is different than a concurrent REINDEX, because in this case we just do an in-place change between the old and new index, meaning that even if there is a failure happening while processing, we may have some invalid indexes, but there are still valid indexes attached to the partition tree, at any time. + MemoryContext oldcontext = MemoryContextSwitchTo(ind_context); PartitionDesc partdesc = RelationGetPartitionDesc(rel); int nparts = partdesc->nparts; + char *relname = pstrdup(RelationGetRelationName(rel)); Er, no. We should not play with the relation cache calls in a private memory context. I think that this needs much more thoughts. -- Michael
signature.asc
Description: PGP signature