Hi Amit, On 2017/09/11 16:16, Amit Khandekar wrote: > Thanks Amit for the patch. I am still reviewing it, but meanwhile > below are a few comments so far ...
Thanks for the review. > + next_parted_idx += (list_length(*pds) - next_parted_idx - 1); > > I think this can be replaced just by : > + next_parted_idx = list_length(*pds) - 1; > Or, how about removing this variable next_parted_idx altogether ? > Instead, we can just do this : > pd->indexes[i] = -(1 + list_length(*pds)); That seems like the simplest possible way to do it. > + next_leaf_idx += (list_length(*leaf_part_oids) - next_leaf_idx); > > Didn't understand why next_leaf_idx needs to be updated in case when > the current partrelid is partitioned. I think it should be incremented > only for leaf partitions, no ? Or for that matter, again, how about > removing the variable 'next_leaf_idx' and doing this : > *leaf_part_oids = lappend_oid(*leaf_part_oids, partrelid); > pd->indexes[i] = list_length(*leaf_part_oids) - 1; Yep. Attached updated patch does it that way for both partitioned table indexes and leaf partition indexes. Thanks for pointing it out. > ----------- > > * For every partitioned table in the tree, starting with the root > * partitioned table, add its relcache entry to parted_rels, while also > * queuing its partitions (in the order in which they appear in the > * partition descriptor) to be looked at later in the same loop. This is > * a bit tricky but works because the foreach() macro doesn't fetch the > * next list element until the bottom of the loop. > > I think the above comment needs to be modified with something > explaining the relevant changed code. For e.g. there is no > parted_rels, and the "tricky" part was there earlier because of the > list being iterated and at the same time being appended. > > ------------ I think I forgot to update this comment. > I couldn't see the existing comments like "Indexes corresponding to > the internal partitions are multiplied by" anywhere in the patch. I > think those comments are still valid, and important. Again, I failed to keep this comment. Anyway, I reworded the comments a bit to describe what the code is doing more clearly. Hope you find it so too. Thanks, Amit
From 0e04cee14a5168e0652c2aa646c169789ae41e8e Mon Sep 17 00:00:00 2001 From: amit <amitlangot...@gmail.com> Date: Wed, 30 Aug 2017 10:02:05 +0900 Subject: [PATCH 1/2] Decouple RelationGetPartitionDispatchInfo() from executor Currently it and the structure it generates viz. PartitionDispatch objects are too coupled with the executor's tuple-routing code. In particular, it's pretty undesirable that it makes it the responsibility of the caller to release some resources, such as executor tuple table slots, tuple-conversion maps, etc. That makes it harder to use in places other than where it's currently being used. After this refactoring, ExecSetupPartitionTupleRouting() now needs to do some of the work that was previously done in RelationGetPartitionDispatchInfo(). --- src/backend/catalog/partition.c | 53 +++++++------------- src/backend/commands/copy.c | 32 +++++++------ src/backend/executor/execMain.c | 88 ++++++++++++++++++++++++++++++---- src/backend/executor/nodeModifyTable.c | 37 +++++++------- src/include/catalog/partition.h | 20 +++----- src/include/executor/executor.h | 4 +- src/include/nodes/execnodes.h | 40 +++++++++++++++- 7 files changed, 181 insertions(+), 93 deletions(-) diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c index 73eff17202..555b7c21c7 100644 --- a/src/backend/catalog/partition.c +++ b/src/backend/catalog/partition.c @@ -1292,7 +1292,6 @@ RelationGetPartitionDispatchInfo(Relation rel, Relation partrel = lfirst(lc1); Relation parent = lfirst(lc2); PartitionKey partkey = RelationGetPartitionKey(partrel); - TupleDesc tupdesc = RelationGetDescr(partrel); PartitionDesc partdesc = RelationGetPartitionDesc(partrel); int j, m; @@ -1300,29 +1299,12 @@ RelationGetPartitionDispatchInfo(Relation rel, pd[i] = (PartitionDispatch) palloc(sizeof(PartitionDispatchData)); pd[i]->reldesc = partrel; pd[i]->key = partkey; - pd[i]->keystate = NIL; pd[i]->partdesc = partdesc; if (parent != NULL) - { - /* - * For every partitioned table other than root, we must store a - * tuple table slot initialized with its tuple descriptor and a - * tuple conversion map to convert a tuple from its parent's - * rowtype to its own. That is to make sure that we are looking at - * the correct row using the correct tuple descriptor when - * computing its partition key for tuple routing. - */ - pd[i]->tupslot = MakeSingleTupleTableSlot(tupdesc); - pd[i]->tupmap = convert_tuples_by_name(RelationGetDescr(parent), - tupdesc, - gettext_noop("could not convert row type")); - } + pd[i]->parentoid = RelationGetRelid(parent); else - { - /* Not required for the root partitioned table */ - pd[i]->tupslot = NULL; - pd[i]->tupmap = NULL; - } + pd[i]->parentoid = InvalidOid; + pd[i]->indexes = (int *) palloc(partdesc->nparts * sizeof(int)); /* @@ -2233,7 +2215,7 @@ generate_partition_qual(Relation rel) * Construct values[] and isnull[] arrays for the partition key * of a tuple. * - * pd Partition dispatch object of the partitioned table + * ptrinfo PartitionTupleRoutingInfo object of the table * slot Heap tuple from which to extract partition key * estate executor state for evaluating any partition key * expressions (must be non-NULL) @@ -2245,26 +2227,27 @@ generate_partition_qual(Relation rel) * ---------------- */ void -FormPartitionKeyDatum(PartitionDispatch pd, +FormPartitionKeyDatum(PartitionTupleRoutingInfo *ptrinfo, TupleTableSlot *slot, EState *estate, Datum *values, bool *isnull) { + PartitionDispatch pd = ptrinfo->pd; ListCell *partexpr_item; int i; - if (pd->key->partexprs != NIL && pd->keystate == NIL) + if (pd->key->partexprs != NIL && ptrinfo->keystate == NIL) { /* Check caller has set up context correctly */ Assert(estate != NULL && GetPerTupleExprContext(estate)->ecxt_scantuple == slot); /* First time through, set up expression evaluation state */ - pd->keystate = ExecPrepareExprList(pd->key->partexprs, estate); + ptrinfo->keystate = ExecPrepareExprList(pd->key->partexprs, estate); } - partexpr_item = list_head(pd->keystate); + partexpr_item = list_head(ptrinfo->keystate); for (i = 0; i < pd->key->partnatts; i++) { AttrNumber keycol = pd->key->partattrs[i]; @@ -2304,13 +2287,13 @@ FormPartitionKeyDatum(PartitionDispatch pd, * the latter case. */ int -get_partition_for_tuple(PartitionDispatch *pd, +get_partition_for_tuple(PartitionTupleRoutingInfo **ptrinfos, TupleTableSlot *slot, EState *estate, - PartitionDispatchData **failed_at, + PartitionTupleRoutingInfo **failed_at, TupleTableSlot **failed_slot) { - PartitionDispatch parent; + PartitionTupleRoutingInfo *parent; Datum values[PARTITION_MAX_KEYS]; bool isnull[PARTITION_MAX_KEYS]; int result; @@ -2318,11 +2301,11 @@ get_partition_for_tuple(PartitionDispatch *pd, TupleTableSlot *ecxt_scantuple_old = ecxt->ecxt_scantuple; /* start with the root partitioned table */ - parent = pd[0]; + parent = ptrinfos[0]; while (true) { - PartitionKey key = parent->key; - PartitionDesc partdesc = parent->partdesc; + PartitionKey key = parent->pd->key; + PartitionDesc partdesc = parent->pd->partdesc; TupleTableSlot *myslot = parent->tupslot; TupleConversionMap *map = parent->tupmap; int cur_index = -1; @@ -2458,13 +2441,13 @@ get_partition_for_tuple(PartitionDispatch *pd, *failed_slot = slot; break; } - else if (parent->indexes[cur_index] >= 0) + else if (parent->pd->indexes[cur_index] >= 0) { - result = parent->indexes[cur_index]; + result = parent->pd->indexes[cur_index]; break; } else - parent = pd[-parent->indexes[cur_index]]; + parent = ptrinfos[-parent->pd->indexes[cur_index]]; } error_exit: diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c index cfa3f059c2..288d6a1ab2 100644 --- a/src/backend/commands/copy.c +++ b/src/backend/commands/copy.c @@ -165,8 +165,8 @@ typedef struct CopyStateData bool volatile_defexprs; /* is any of defexprs volatile? */ List *range_table; - PartitionDispatch *partition_dispatch_info; - int num_dispatch; /* Number of entries in the above array */ + PartitionTupleRoutingInfo **ptrinfos; + int num_parted; /* Number of entries in the above array */ int num_partitions; /* Number of members in the following arrays */ ResultRelInfo *partitions; /* Per partition result relation */ TupleConversionMap **partition_tupconv_maps; @@ -2445,7 +2445,7 @@ CopyFrom(CopyState cstate) */ if (cstate->rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) { - PartitionDispatch *partition_dispatch_info; + PartitionTupleRoutingInfo **ptrinfos; ResultRelInfo *partitions; TupleConversionMap **partition_tupconv_maps; TupleTableSlot *partition_tuple_slot; @@ -2455,13 +2455,13 @@ CopyFrom(CopyState cstate) ExecSetupPartitionTupleRouting(cstate->rel, 1, estate, - &partition_dispatch_info, + &ptrinfos, &partitions, &partition_tupconv_maps, &partition_tuple_slot, &num_parted, &num_partitions); - cstate->partition_dispatch_info = partition_dispatch_info; - cstate->num_dispatch = num_parted; + cstate->ptrinfos = ptrinfos; + cstate->num_parted = num_parted; cstate->partitions = partitions; cstate->num_partitions = num_partitions; cstate->partition_tupconv_maps = partition_tupconv_maps; @@ -2502,7 +2502,7 @@ CopyFrom(CopyState cstate) if ((resultRelInfo->ri_TrigDesc != NULL && (resultRelInfo->ri_TrigDesc->trig_insert_before_row || resultRelInfo->ri_TrigDesc->trig_insert_instead_row)) || - cstate->partition_dispatch_info != NULL || + cstate->ptrinfos != NULL || cstate->volatile_defexprs) { useHeapMultiInsert = false; @@ -2580,7 +2580,7 @@ CopyFrom(CopyState cstate) ExecStoreTuple(tuple, slot, InvalidBuffer, false); /* Determine the partition to heap_insert the tuple into */ - if (cstate->partition_dispatch_info) + if (cstate->ptrinfos) { int leaf_part_index; TupleConversionMap *map; @@ -2594,7 +2594,7 @@ CopyFrom(CopyState cstate) * partition, respectively. */ leaf_part_index = ExecFindPartition(resultRelInfo, - cstate->partition_dispatch_info, + cstate->ptrinfos, slot, estate); Assert(leaf_part_index >= 0 && @@ -2826,8 +2826,8 @@ CopyFrom(CopyState cstate) ExecCloseIndices(resultRelInfo); - /* Close all the partitioned tables, leaf partitions, and their indices */ - if (cstate->partition_dispatch_info) + /* Release some resources that we acquired for tuple-routing. */ + if (cstate->ptrinfos) { int i; @@ -2837,13 +2837,15 @@ CopyFrom(CopyState cstate) * the main target table of COPY that will be closed eventually by * DoCopy(). Also, tupslot is NULL for the root partitioned table. */ - for (i = 1; i < cstate->num_dispatch; i++) + for (i = 1; i < cstate->num_parted; i++) { - PartitionDispatch pd = cstate->partition_dispatch_info[i]; + PartitionTupleRoutingInfo *ptrinfo = cstate->ptrinfos[i]; - heap_close(pd->reldesc, NoLock); - ExecDropSingleTupleTableSlot(pd->tupslot); + heap_close(ptrinfo->pd->reldesc, NoLock); + ExecDropSingleTupleTableSlot(ptrinfo->tupslot); } + + /* Close all the leaf partitions and their indices */ for (i = 0; i < cstate->num_partitions; i++) { ResultRelInfo *resultRelInfo = cstate->partitions + i; diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c index 4b594d489c..fe186abe69 100644 --- a/src/backend/executor/execMain.c +++ b/src/backend/executor/execMain.c @@ -3243,8 +3243,8 @@ EvalPlanQualEnd(EPQState *epqstate) * tuple routing for partitioned tables * * Output arguments: - * 'pd' receives an array of PartitionDispatch objects with one entry for - * every partitioned table in the partition tree + * 'ptrinfos' receives an array of PartitionTupleRoutingInfo objects with one + * entry for each partitioned table in the partition tree * 'partitions' receives an array of ResultRelInfo objects with one entry for * every leaf partition in the partition tree * 'tup_conv_maps' receives an array of TupleConversionMap objects with one @@ -3267,7 +3267,7 @@ void ExecSetupPartitionTupleRouting(Relation rel, Index resultRTindex, EState *estate, - PartitionDispatch **pd, + PartitionTupleRoutingInfo ***ptrinfos, ResultRelInfo **partitions, TupleConversionMap ***tup_conv_maps, TupleTableSlot **partition_tuple_slot, @@ -3275,16 +3275,84 @@ ExecSetupPartitionTupleRouting(Relation rel, { TupleDesc tupDesc = RelationGetDescr(rel); List *leaf_parts; + PartitionDispatch *pds; ListCell *cell; int i; ResultRelInfo *leaf_part_rri; + Relation parent; /* * Get the information about the partition tree after locking all the * partitions. */ (void) find_all_inheritors(RelationGetRelid(rel), RowExclusiveLock, NULL); - *pd = RelationGetPartitionDispatchInfo(rel, num_parted, &leaf_parts); + pds = RelationGetPartitionDispatchInfo(rel, num_parted, &leaf_parts); + + /* + * Construct PartitionTupleRoutingInfo objects, one for each partitioned + * table in the tree, using its PartitionDispatch in the pds array. + */ + *ptrinfos = (PartitionTupleRoutingInfo **) palloc0(*num_parted * + sizeof(PartitionTupleRoutingInfo *)); + parent = NULL; + for (i = 0; i < *num_parted; i++) + { + PartitionTupleRoutingInfo *ptrinfo; + + ptrinfo = (PartitionTupleRoutingInfo *) + palloc0(sizeof(PartitionTupleRoutingInfo)); + /* Stash a reference to this PartitionDispatch. */ + ptrinfo->pd = pds[i]; + + /* State for extracting partition key from tuples will go here. */ + ptrinfo->keystate = NIL; + + /* + * For every partitioned table other than root, we must store a tuple + * table slot initialized with its tuple descriptor and a tuple + * conversion map to convert a tuple from its parent's rowtype to its + * own. That is to make sure that we are looking at the correct row + * using the correct tuple descriptor when computing its partition key + * for tuple routing. + */ + if (pds[i]->parentoid != InvalidOid) + { + TupleDesc tupdesc = RelationGetDescr(pds[i]->reldesc); + + /* Open the parent relation descriptor if not already done. */ + if (pds[i]->parentoid == RelationGetRelid(rel)) + parent = rel; + else if (parent == NULL) + /* Locked by RelationGetPartitionDispatchInfo(). */ + parent = heap_open(pds[i]->parentoid, NoLock); + + ptrinfo->tupslot = MakeSingleTupleTableSlot(tupdesc); + ptrinfo->tupmap = convert_tuples_by_name(RelationGetDescr(parent), + tupdesc, + gettext_noop("could not convert row type")); + /* + * Close the parent descriptor, if the next partitioned table in + * the list is not a sibling, because it will have a different + * parent if so. + */ + if (parent != NULL && parent != rel && i + 1 < *num_parted && + pds[i + 1]->parentoid != pds[i]->parentoid) + { + heap_close(parent, NoLock); + parent = NULL; + } + } + else + { + /* Not required for the root partitioned table */ + ptrinfo->tupslot = NULL; + ptrinfo->tupmap = NULL; + } + + (*ptrinfos)[i] = ptrinfo; + } + + /* For leaf partitions, we build ResultRelInfos and TupleConversionMaps. */ *num_partitions = list_length(leaf_parts); *partitions = (ResultRelInfo *) palloc(*num_partitions * sizeof(ResultRelInfo)); @@ -3361,11 +3429,13 @@ ExecSetupPartitionTupleRouting(Relation rel, * by get_partition_for_tuple() unchanged. */ int -ExecFindPartition(ResultRelInfo *resultRelInfo, PartitionDispatch *pd, - TupleTableSlot *slot, EState *estate) +ExecFindPartition(ResultRelInfo *resultRelInfo, + PartitionTupleRoutingInfo **ptrinfos, + TupleTableSlot *slot, + EState *estate) { int result; - PartitionDispatchData *failed_at; + PartitionTupleRoutingInfo *failed_at; TupleTableSlot *failed_slot; /* @@ -3375,7 +3445,7 @@ ExecFindPartition(ResultRelInfo *resultRelInfo, PartitionDispatch *pd, if (resultRelInfo->ri_PartitionCheck) ExecPartitionCheck(resultRelInfo, slot, estate); - result = get_partition_for_tuple(pd, slot, estate, + result = get_partition_for_tuple(ptrinfos, slot, estate, &failed_at, &failed_slot); if (result < 0) { @@ -3385,7 +3455,7 @@ ExecFindPartition(ResultRelInfo *resultRelInfo, PartitionDispatch *pd, char *val_desc; ExprContext *ecxt = GetPerTupleExprContext(estate); - failed_rel = failed_at->reldesc; + failed_rel = failed_at->pd->reldesc; ecxt->ecxt_scantuple = failed_slot; FormPartitionKeyDatum(failed_at, failed_slot, estate, key_values, key_isnull); diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c index 49586a3c03..61ea9afa01 100644 --- a/src/backend/executor/nodeModifyTable.c +++ b/src/backend/executor/nodeModifyTable.c @@ -278,7 +278,7 @@ ExecInsert(ModifyTableState *mtstate, resultRelInfo = estate->es_result_relation_info; /* Determine the partition to heap_insert the tuple into */ - if (mtstate->mt_partition_dispatch_info) + if (mtstate->mt_ptrinfos) { int leaf_part_index; TupleConversionMap *map; @@ -292,7 +292,7 @@ ExecInsert(ModifyTableState *mtstate, * respectively. */ leaf_part_index = ExecFindPartition(resultRelInfo, - mtstate->mt_partition_dispatch_info, + mtstate->mt_ptrinfos, slot, estate); Assert(leaf_part_index >= 0 && @@ -1487,7 +1487,7 @@ ExecSetupTransitionCaptureState(ModifyTableState *mtstate, EState *estate) int numResultRelInfos; /* Find the set of partitions so that we can find their TupleDescs. */ - if (mtstate->mt_partition_dispatch_info != NULL) + if (mtstate->mt_ptrinfos != NULL) { /* * For INSERT via partitioned table, so we need TupleDescs based @@ -1911,7 +1911,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags) if (operation == CMD_INSERT && rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) { - PartitionDispatch *partition_dispatch_info; + PartitionTupleRoutingInfo **ptrinfos; ResultRelInfo *partitions; TupleConversionMap **partition_tupconv_maps; TupleTableSlot *partition_tuple_slot; @@ -1921,13 +1921,13 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags) ExecSetupPartitionTupleRouting(rel, node->nominalRelation, estate, - &partition_dispatch_info, + &ptrinfos, &partitions, &partition_tupconv_maps, &partition_tuple_slot, &num_parted, &num_partitions); - mtstate->mt_partition_dispatch_info = partition_dispatch_info; - mtstate->mt_num_dispatch = num_parted; + mtstate->mt_ptrinfos = ptrinfos; + mtstate->mt_num_parted = num_parted; mtstate->mt_partitions = partitions; mtstate->mt_num_partitions = num_partitions; mtstate->mt_partition_tupconv_maps = partition_tupconv_maps; @@ -2342,21 +2342,24 @@ ExecEndModifyTable(ModifyTableState *node) resultRelInfo); } + /* Release some resources that we acquired for tuple-routing. */ + /* - * Close all the partitioned tables, leaf partitions, and their indices - * - * Remember node->mt_partition_dispatch_info[0] corresponds to the root - * partitioned table, which we must not try to close, because it is the - * main target table of the query that will be closed by ExecEndPlan(). - * Also, tupslot is NULL for the root partitioned table. + * node->mt_ptrinfos[0] corresponds to the root partitioned table, for + * which we didn't create tupslot. Also, its relation descriptor will + * be closed in ExecEndPlan(). */ - for (i = 1; i < node->mt_num_dispatch; i++) + for (i = 1; i < node->mt_num_parted; i++) { - PartitionDispatch pd = node->mt_partition_dispatch_info[i]; + PartitionTupleRoutingInfo *ptrinfo = node->mt_ptrinfos[i]; - heap_close(pd->reldesc, NoLock); - ExecDropSingleTupleTableSlot(pd->tupslot); + heap_close(ptrinfo->pd->reldesc, NoLock); + ExecDropSingleTupleTableSlot(ptrinfo->tupslot); } + + /* + * Close all the leaf partitions and their indices. + */ for (i = 0; i < node->mt_num_partitions; i++) { ResultRelInfo *resultRelInfo = node->mt_partitions + i; diff --git a/src/include/catalog/partition.h b/src/include/catalog/partition.h index 454a940a23..ebf82b55cf 100644 --- a/src/include/catalog/partition.h +++ b/src/include/catalog/partition.h @@ -45,13 +45,8 @@ typedef struct PartitionDescData *PartitionDesc; * * reldesc Relation descriptor of the table * key Partition key information of the table - * keystate Execution state required for expressions in the partition key * partdesc Partition descriptor of the table - * tupslot A standalone TupleTableSlot initialized with this table's tuple - * descriptor - * tupmap TupleConversionMap to convert from the parent's rowtype to - * this table's rowtype (when extracting the partition key of a - * tuple just before routing it through this table) + * parentoid OID of the parent table (InvalidOid if root partitioned table) * indexes Array with partdesc->nparts members (for details on what * individual members represent, see how they are set in * RelationGetPartitionDispatchInfo()) @@ -61,10 +56,8 @@ typedef struct PartitionDispatchData { Relation reldesc; PartitionKey key; - List *keystate; /* list of ExprState */ PartitionDesc partdesc; - TupleTableSlot *tupslot; - TupleConversionMap *tupmap; + Oid parentoid; int *indexes; } PartitionDispatchData; @@ -86,18 +79,19 @@ extern List *map_partition_varattnos(List *expr, int target_varno, extern List *RelationGetPartitionQual(Relation rel); extern Expr *get_partition_qual_relid(Oid relid); -/* For tuple routing */ extern PartitionDispatch *RelationGetPartitionDispatchInfo(Relation rel, int *num_parted, List **leaf_part_oids); -extern void FormPartitionKeyDatum(PartitionDispatch pd, + +/* For tuple routing */ +extern void FormPartitionKeyDatum(PartitionTupleRoutingInfo *ptrinfo, TupleTableSlot *slot, EState *estate, Datum *values, bool *isnull); -extern int get_partition_for_tuple(PartitionDispatch *pd, +extern int get_partition_for_tuple(PartitionTupleRoutingInfo **ptrinfos, TupleTableSlot *slot, EState *estate, - PartitionDispatchData **failed_at, + PartitionTupleRoutingInfo **failed_at, TupleTableSlot **failed_slot); extern Oid get_default_oid_from_partdesc(PartitionDesc partdesc); extern Oid get_default_partition_oid(Oid parentId); diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h index 770881849c..aee7a41b31 100644 --- a/src/include/executor/executor.h +++ b/src/include/executor/executor.h @@ -209,13 +209,13 @@ extern HeapTuple EvalPlanQualGetTuple(EPQState *epqstate, Index rti); extern void ExecSetupPartitionTupleRouting(Relation rel, Index resultRTindex, EState *estate, - PartitionDispatch **pd, + PartitionTupleRoutingInfo ***ptrinfos, ResultRelInfo **partitions, TupleConversionMap ***tup_conv_maps, TupleTableSlot **partition_tuple_slot, int *num_parted, int *num_partitions); extern int ExecFindPartition(ResultRelInfo *resultRelInfo, - PartitionDispatch *pd, + PartitionTupleRoutingInfo **ptrinfos, TupleTableSlot *slot, EState *estate); diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h index 90a60abc4d..c554a1b311 100644 --- a/src/include/nodes/execnodes.h +++ b/src/include/nodes/execnodes.h @@ -414,6 +414,42 @@ typedef struct ResultRelInfo Relation ri_PartitionRoot; } ResultRelInfo; +/* Forward declarations, to avoid including other headers */ +typedef struct PartitionDispatchData *PartitionDispatch; + +/* + * PartitionTupleRoutingInfo - information required for tuple-routing + * through one partitioned table in a partition + * tree + */ +typedef struct PartitionTupleRoutingInfo +{ + + /* Information about the table's partitions */ + PartitionDispatch pd; + + /* + * The execution state required for expressions contained in the partition + * key. It is NIL until initialized by FormPartitionKeyDatum() if and when + * it is called; for example, the first time a tuple is routed through this + * table. + */ + List *keystate; + + /* + * A standalone TupleTableSlot initialized with this table's tuple + * descriptor + */ + TupleTableSlot *tupslot; + + /* + * TupleConversionMap to convert from the parent's rowtype to this table's + * rowtype (when extracting the partition key of a tuple just before + * routing it through this table) + */ + TupleConversionMap *tupmap; +} PartitionTupleRoutingInfo; + /* ---------------- * EState information * @@ -973,9 +1009,9 @@ typedef struct ModifyTableState TupleTableSlot *mt_existing; /* slot to store existing target tuple in */ List *mt_excludedtlist; /* the excluded pseudo relation's tlist */ TupleTableSlot *mt_conflproj; /* CONFLICT ... SET ... projection target */ - struct PartitionDispatchData **mt_partition_dispatch_info; /* Tuple-routing support info */ - int mt_num_dispatch; /* Number of entries in the above array */ + struct PartitionTupleRoutingInfo **mt_ptrinfos; + int mt_num_parted; /* Number of entries in the above array */ int mt_num_partitions; /* Number of members in the following * arrays */ ResultRelInfo *mt_partitions; /* Per partition result relation */ -- 2.11.0
From e980019fdc688321af809d7d3547d25a3d6ff15f Mon Sep 17 00:00:00 2001 From: amit <amitlangot...@gmail.com> Date: Fri, 8 Sep 2017 17:35:10 +0900 Subject: [PATCH 2/2] Make RelationGetPartitionDispatch expansion order depth-first This is so as it matches what the planner is doing with partitioning inheritance expansion. Matching with planner order helps because it helps ease matching the executor's per-partition objects with planner-created per-partition nodes. --- src/backend/catalog/partition.c | 211 ++++++++++++++++------------------------ 1 file changed, 83 insertions(+), 128 deletions(-) diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c index 555b7c21c7..84c63a9ffe 100644 --- a/src/backend/catalog/partition.c +++ b/src/backend/catalog/partition.c @@ -147,6 +147,8 @@ static int32 partition_bound_cmp(PartitionKey key, static int partition_bound_bsearch(PartitionKey key, PartitionBoundInfo boundinfo, void *probe, bool probe_is_bound, bool *is_equal); +static void get_partition_dispatch_recurse(Relation rel, Relation parent, + List **pds, List **leaf_part_oids); /* * RelationBuildPartitionDesc @@ -1192,21 +1194,6 @@ get_partition_qual_relid(Oid relid) } /* - * Append OIDs of rel's partitions to the list 'partoids' and for each OID, - * append pointer rel to the list 'parents'. - */ -#define APPEND_REL_PARTITION_OIDS(rel, partoids, parents) \ - do\ - {\ - int i;\ - for (i = 0; i < (rel)->rd_partdesc->nparts; i++)\ - {\ - (partoids) = lappend_oid((partoids), (rel)->rd_partdesc->oids[i]);\ - (parents) = lappend((parents), (rel));\ - }\ - } while(0) - -/* * RelationGetPartitionDispatchInfo * Returns information necessary to route tuples down a partition tree * @@ -1222,133 +1209,101 @@ PartitionDispatch * RelationGetPartitionDispatchInfo(Relation rel, int *num_parted, List **leaf_part_oids) { + List *pdlist; PartitionDispatchData **pd; - List *all_parts = NIL, - *all_parents = NIL, - *parted_rels, - *parted_rel_parents; - ListCell *lc1, - *lc2; - int i, - k, - offset; + ListCell *lc; + int i; - /* - * We rely on the relcache to traverse the partition tree to build both - * the leaf partition OIDs list and the array of PartitionDispatch objects - * for the partitioned tables in the tree. That means every partitioned - * table in the tree must be locked, which is fine since we require the - * caller to lock all the partitions anyway. - * - * For every partitioned table in the tree, starting with the root - * partitioned table, add its relcache entry to parted_rels, while also - * queuing its partitions (in the order in which they appear in the - * partition descriptor) to be looked at later in the same loop. This is - * a bit tricky but works because the foreach() macro doesn't fetch the - * next list element until the bottom of the loop. - */ - *num_parted = 1; - parted_rels = list_make1(rel); - /* Root partitioned table has no parent, so NULL for parent */ - parted_rel_parents = list_make1(NULL); - APPEND_REL_PARTITION_OIDS(rel, all_parts, all_parents); - forboth(lc1, all_parts, lc2, all_parents) - { - Oid partrelid = lfirst_oid(lc1); - Relation parent = lfirst(lc2); + Assert(rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE); - if (get_rel_relkind(partrelid) == RELKIND_PARTITIONED_TABLE) - { - /* - * Already locked by the caller. Note that it is the - * responsibility of the caller to close the below relcache entry, - * once done using the information being collected here (for - * example, in ExecEndModifyTable). - */ - Relation partrel = heap_open(partrelid, NoLock); + *num_parted = 0; + *leaf_part_oids = NIL; - (*num_parted)++; - parted_rels = lappend(parted_rels, partrel); - parted_rel_parents = lappend(parted_rel_parents, parent); - APPEND_REL_PARTITION_OIDS(partrel, all_parts, all_parents); - } + get_partition_dispatch_recurse(rel, NULL, &pdlist, leaf_part_oids); + *num_parted = list_length(pdlist); + pd = (PartitionDispatchData **) palloc(*num_parted * + sizeof(PartitionDispatchData *)); + i = 0; + foreach (lc, pdlist) + { + pd[i++] = lfirst(lc); } + return pd; +} + +/* + * get_partition_dispatch_recurse + * Recursively expand partition tree rooted at rel + * + * As the partition tree is expanded in a depth-first manner, we mantain two + * global lists: of PartitionDispatch objects corresponding to partitioned + * tables in *pds and of the leaf partition OIDs in *leaf_part_oids. + */ +static void +get_partition_dispatch_recurse(Relation rel, Relation parent, + List **pds, List **leaf_part_oids) +{ + PartitionDesc partdesc = RelationGetPartitionDesc(rel); + PartitionKey partkey = RelationGetPartitionKey(rel); + PartitionDispatch pd; + int i; + + /* Build a PartitionDispatch for this table and add it to *pds. */ + pd = (PartitionDispatch) palloc(sizeof(PartitionDispatchData)); + *pds = lappend(*pds, pd); + pd->reldesc = rel; + pd->key = partkey; + pd->partdesc = partdesc; + if (parent != NULL) + pd->parentoid = RelationGetRelid(parent); + else + pd->parentoid = InvalidOid; + /* - * We want to create two arrays - one for leaf partitions and another for - * partitioned tables (including the root table and internal partitions). - * While we only create the latter here, leaf partition array of suitable - * objects (such as, ResultRelInfo) is created by the caller using the - * list of OIDs we return. Indexes into these arrays get assigned in a - * breadth-first manner, whereby partitions of any given level are placed - * consecutively in the respective arrays. + * Go look at each partition of this table. If it's a leaf partition, + * simply add its OID to *leaf_part_oids. If it's a partitioned table, + * recursively call get_partition_dispatch_recurse(), so that its + * partitions are processed as well and a corresponding PartitionDispatch + * object gets added to *pds. + * + * About the values in pd->indexes: for a leaf partition, it contains the + * leaf partition's position in the global list *leaf_part_oids minus 1, + * whereas for a partitioned table partition, it contains the partition's + * position in the global list *pds multiplied by -1. The latter is + * multiplied by -1 to distinguish partitioned tables from leaf partitions + * when going through the values in pd->indexes. So, for example, when + * using it during tuple-routing, encountering a value >= 0 means we found + * a leaf partition. It is immediately returned as the index in the array + * of ResultRelInfos of all the leaf partitions, using which we insert the + * tuple into that leaf partition. A negative value means we found a + * partitioned table. The value multiplied back by -1 is returned as the + * index in the array of PartitionDispatch objects of all partitioned + * tables in the tree, using which, search is continued further down the + * partition tree. */ - pd = (PartitionDispatchData **) palloc(*num_parted * - sizeof(PartitionDispatchData *)); - *leaf_part_oids = NIL; - i = k = offset = 0; - forboth(lc1, parted_rels, lc2, parted_rel_parents) + pd->indexes = (int *) palloc(partdesc->nparts * sizeof(int)); + for (i = 0; i < partdesc->nparts; i++) { - Relation partrel = lfirst(lc1); - Relation parent = lfirst(lc2); - PartitionKey partkey = RelationGetPartitionKey(partrel); - PartitionDesc partdesc = RelationGetPartitionDesc(partrel); - int j, - m; - - pd[i] = (PartitionDispatch) palloc(sizeof(PartitionDispatchData)); - pd[i]->reldesc = partrel; - pd[i]->key = partkey; - pd[i]->partdesc = partdesc; - if (parent != NULL) - pd[i]->parentoid = RelationGetRelid(parent); - else - pd[i]->parentoid = InvalidOid; - - pd[i]->indexes = (int *) palloc(partdesc->nparts * sizeof(int)); + Oid partrelid = partdesc->oids[i]; - /* - * Indexes corresponding to the internal partitions are multiplied by - * -1 to distinguish them from those of leaf partitions. Encountering - * an index >= 0 means we found a leaf partition, which is immediately - * returned as the partition we are looking for. A negative index - * means we found a partitioned table, whose PartitionDispatch object - * is located at the above index multiplied back by -1. Using the - * PartitionDispatch object, search is continued further down the - * partition tree. - */ - m = 0; - for (j = 0; j < partdesc->nparts; j++) + if (get_rel_relkind(partrelid) != RELKIND_PARTITIONED_TABLE) { - Oid partrelid = partdesc->oids[j]; - - if (get_rel_relkind(partrelid) != RELKIND_PARTITIONED_TABLE) - { - *leaf_part_oids = lappend_oid(*leaf_part_oids, partrelid); - pd[i]->indexes[j] = k++; - } - else - { - /* - * offset denotes the number of partitioned tables of upper - * levels including those of the current level. Any partition - * of this table must belong to the next level and hence will - * be placed after the last partitioned table of this level. - */ - pd[i]->indexes[j] = -(1 + offset + m); - m++; - } + *leaf_part_oids = lappend_oid(*leaf_part_oids, partrelid); + pd->indexes[i] = list_length(*leaf_part_oids) - 1; } - i++; + else + { + /* + * We assume all tables in the partition tree were already + * locked by the caller. + */ + Relation partrel = heap_open(partrelid, NoLock); - /* - * This counts the number of partitioned tables at upper levels - * including those of the current level. - */ - offset += m; + pd->indexes[i] = -list_length(*pds); + get_partition_dispatch_recurse(partrel, rel, pds, leaf_part_oids); + } } - - return pd; } /* Module-local functions */ -- 2.11.0
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers