Fujita-san, Pavan, Thank you both for reviewing. Replying to both emails here.
On 2018/03/20 20:53, Etsuro Fujita wrote: > Here are comments on executor changes in (the latest version of) the patch: > > @@ -421,8 +424,18 @@ ExecInsert(ModifyTableState *mtstate, > ItemPointerData conflictTid; > bool specConflict; > List *arbiterIndexes; > + PartitionTupleRouting *proute = > + mtstate->mt_partition_tuple_routing; > > - arbiterIndexes = node->arbiterIndexes; > + /* Use the appropriate list of arbiter indexes. */ > + if (mtstate->mt_partition_tuple_routing != NULL) > + { > + Assert(partition_index >= 0 && proute != NULL); > + arbiterIndexes = > + proute->partition_arbiter_indexes[partition_index]; > + } > + else > + arbiterIndexes = node->arbiterIndexes; > > To handle both cases the same way, I wonder if it would be better to have > the arbiterindexes list in ResultRelInfo as well, as mentioned by Alvaro > upthread, or to re-add mt_arbiterindexes as before and set it to > proute->partition_arbiter_indexes[partition_index] before we get here, > maybe in ExecPrepareTupleRouting, in the case of tuple routing. It's a good idea. I somehow missed that Alvaro had already mentioned it. In HEAD, we now have ri_onConflictSetProj and ri_onConflictSetWhere. I propose we name the field ri_onConflictArbiterIndexes as done in the updated patch. > ExecOnConflictUpdate(ModifyTableState *mtstate, > ResultRelInfo *resultRelInfo, > + TupleDesc onConflictSetTupdesc, > ItemPointer conflictTid, > TupleTableSlot *planSlot, > TupleTableSlot *excludedSlot, > @@ -1419,6 +1459,7 @@ ExecOnConflictUpdate(ModifyTableState *mtstate, > ExecCheckHeapTupleVisible(estate, &tuple, buffer); > > /* Store target's existing tuple in the state's dedicated slot */ > + ExecSetSlotDescriptor(mtstate->mt_existing, RelationGetDescr(relation)); > ExecStoreTuple(&tuple, mtstate->mt_existing, buffer, false); > > /* > @@ -1462,6 +1503,7 @@ ExecOnConflictUpdate(ModifyTableState *mtstate, > } > > /* Project the new tuple version */ > + ExecSetSlotDescriptor(mtstate->mt_conflproj, onConflictSetTupdesc); > ExecProject(resultRelInfo->ri_onConflictSetProj); > > Can we do ExecSetSlotDescriptor for mtstate->mt_existing and > mtstate->mt_conflproj in ExecPrepareTupleRouting in the case of tuple > routing? That would make the API changes to ExecOnConflictUpdate > unnecessary. That's a good idea too, so done. > > @@ -2368,9 +2419,13 @@ ExecInitModifyTable(ModifyTable *node, EState > *estate, int eflags) > econtext = mtstate->ps.ps_ExprContext; > relationDesc = resultRelInfo->ri_RelationDesc->rd_att; > > - /* initialize slot for the existing tuple */ > - mtstate->mt_existing = > - ExecInitExtraTupleSlot(mtstate->ps.state, relationDesc); > + /* > + * Initialize slot for the existing tuple. We determine which > + * tupleDesc to use for this after we have determined which relation > + * the insert/update will be applied to, possibly after performing > + * tuple routing. > + */ > + mtstate->mt_existing = ExecInitExtraTupleSlot(mtstate->ps.state, > NULL); > > /* carried forward solely for the benefit of explain */ > mtstate->mt_excludedtlist = node->exclRelTlist; > @@ -2378,8 +2433,16 @@ ExecInitModifyTable(ModifyTable *node, EState > *estate, int eflags) > /* create target slot for UPDATE SET projection */ > tupDesc = ExecTypeFromTL((List *) node->onConflictSet, > relationDesc->tdhasoid); > + PinTupleDesc(tupDesc); > + mtstate->mt_conflproj_tupdesc = tupDesc; > + > + /* > + * Just like the "existing tuple" slot, we'll defer deciding which > + * tupleDesc to use for this slot to a point where tuple routing has > + * been performed. > + */ > mtstate->mt_conflproj = > - ExecInitExtraTupleSlot(mtstate->ps.state, tupDesc); > + ExecInitExtraTupleSlot(mtstate->ps.state, NULL); > > If we do ExecInitExtraTupleSlot for mtstate->mt_existing and > mtstate->mt_conflproj in ExecPrepareTupleRouting in the case of tuple > routing, as said above, we wouldn't need this changes. I think doing that > only in the case of tuple routing and keeping this as-is would be better > because that would save cycles in the normal case. Hmm, I think we shouldn't be doing ExecInitExtraTupleSlot in ExecPrepareTupleRouting, because we shouldn't have more than one instance of mtstate->mt_existing and mtstate->mt_conflproj slots. As you also said above, I think you meant to say here that we do ExecInitExtraTupleSlot only once for both mtstate->mt_existing and mtstate->mt_conflproj in ExecInitModifyTable and only do ExecSetSlotDescriptor in ExecPrepareTupleRouting. I have changed it so that ExecInitModifyTable now both creates the slot and sets the descriptor for non-tuple-routing cases and only creates but doesn't set the descriptor in the tuple-routing case. For ExecPrepareTupleRouting to be able to access the tupDesc of the onConflictSet target list, I've added ri_onConflictSetProjTupDesc which is set by ExecInitPartitionInfo on first call for a give partition. This is also suggested by Pavan in his review. Considering all of that, both mt_conflproj_tupdesc and partition_conflproj_tdescs (the array in PartitionTupleRouting) no longer exist in the patch. And since arbiterIndexes has been moved into ResultRelInfo too, partition_arbiter_indexes (the array in PartitionTupleRouting) is gone too. On 2018/03/22 13:34, Pavan Deolasee wrote: > Thanks for writing a new version. A few comments: > > > <listitem> > <para> > - Using the <literal>ON CONFLICT</literal> clause with partitioned > tables > - will cause an error if the conflict target is specified (see > - <xref linkend="sql-on-conflict" /> for more details on how the > clause > - works). Therefore, it is not possible to specify > - <literal>DO UPDATE</literal> as the alternative action, because > - specifying the conflict target is mandatory in that case. On the > other > - hand, specifying <literal>DO NOTHING</literal> as the alternative > action > - works fine provided the conflict target is not specified. In that > case, > - unique constraints (or exclusion constraints) of the individual leaf > - partitions are considered. > - </para> > - </listitem> > > We should document it somewhere that partition key update is not supported > by > ON CONFLICT DO UPDATE Agreed. I have added a line on INSERT reference page to mention this limitation. > /* > * get_partition_parent > + * Obtain direct parent or topmost ancestor of given relation > * > - * Returns inheritance parent of a partition by scanning pg_inherits > + * Returns direct inheritance parent of a partition by scanning > pg_inherits; > + * or, if 'getroot' is true, the topmost parent in the inheritance > hierarchy. > * > * Note: Because this function assumes that the relation whose OID is > passed > * as an argument will have precisely one parent, it should only be called > * when it is known that the relation is a partition. > */ > > Given that most callers only look for immediate parent, I wonder if it makes > sense to have a new function, get_partition_root(), instead of changing > signature of the current function. That will reduce foot-print of this patch > quite a lot. It seems like a good idea, so done that way. > @@ -36,6 +38,7 @@ static char > *ExecBuildSlotPartitionKeyDescription(Relation rel, > Datum *values, > bool *isnull, > int maxfieldlen); > +static List *adjust_onconflictset_tlist(List *tlist, TupleConversionMap > *map); > > We should name this function in a more generic way, given that it's going > to be > used for other things too. What about adjust_partition_tlist? I think that makes sense. We were trying to use adjust_inherited_tlist in the earlier versions of this patch, so adjust_partition_tlist sounds like a good name for this piece of code. > + > + /* > + * If partition's tuple descriptor differs from the root parent, > + * we need to adjust the onConflictSet target list to account for > + * differences in attribute numbers. > + */ > + if (map != NULL) > + { > + /* > + * First convert Vars to contain partition's atttribute > + * numbers. > + */ > + > + /* Convert Vars referencing EXCLUDED pseudo-relation. */ > + onconflset = map_partition_varattnos(node->onConflictSet, > + INNER_VAR, > + partrel, > + firstResultRel, NULL); > > Are we not modifying node->onConflictSet in place? Or does > map_partition_varattnos() create a fresh copy before scribbling on the > input? > If it's former then I guess that's a problem. If it's latter then we ought > to > have comments explaining that. A copy is made before scribbling. Clarified that in the nearby comment. > + tupDesc = ExecTypeFromTL(onconflset, partrelDesc->tdhasoid); > + PinTupleDesc(tupDesc); > > Why do we need to pin the descriptor? If we do need, why don't we need > corresponding ReleaseTupDesc() call? PinTupleDesc was added in the patch as Alvaro had submitted it upthread, which it wasn't clear to me either why it was needed. Looking at it closely, it seems we can get rid of it if for the lack of corresponding ReleaseTupleDesc(). ExecSetSlotDescriptor() seems to take care of pinning and releasing tuple descriptors that are passed to it. If some partition's tupDesc remains pinned because it was the last one that was passed to it, the final ExecResetTupleTable will take care of releasing it. I have removed the instances of PinTupleDesc in the updated patch, but I'm not sure why the loose PinTupleDesc() in the previous version of the patch didn't cause reference leak warnings or some such. > I am still confused if the partition_conflproj_tdescs really belongs to > PartitionTupleRouting or should it be a part of the ResultRelInfo. FWIW for > the > MERGE patch, I moved everything to a new struct and made it part of the > ResultRelInfo. If no re-mapping is necessary, we can just point to the > struct > in the root relation's ResultRelInfo. Otherwise create and populate a new > one > in the partition relation's ResultRelInfo. > > + leaf_part_rri->ri_onConflictSetWhere = > + ExecInitQual(onconflwhere, &mtstate->ps); > + } > > So ri_onConflictSetWhere and ri_onConflictSetProj are part of the > ResultRelInfo. Why not move mt_conflproj_tupdesc, > partition_conflproj_tdescs, > partition_arbiter_indexes etc to the ResultRelInfo as well? I have moved both the projection tupdesc and the arbiter indexes list into ResultRelInfo as I wrote above. > + > +/* > + * Adjust the targetlist entries of an inherited ON CONFLICT DO UPDATE > + * operation for a given partition > + * > > As I said above, we should disassociate this function from ON CONFLICT DO > UPDATE and rather have it as a general purpose facility. OK, have fixed the comment and the name as mentioned above. > + * The expressions have already been fixed, but we have to make sure that > the > + * target resnos match the partition. In some cases, this can force us to > + * re-order the tlist to preserve resno ordering. > + * > > Can we have some explanation regarding how the targetlist is reordered? I > know > the function does that by updating the resno in place, but some explanation > would help. Also, should we add an assertion-build check to confirm that the > resultant list is actually ordered? OK, added a comment and also the assertion-build check on the order of entries. > > @@ -66,7 +67,8 @@ static TupleTableSlot > *ExecPrepareTupleRouting(ModifyTableState *mtstate, > EState *estate, > PartitionTupleRouting *proute, > ResultRelInfo *targetRelInfo, > - TupleTableSlot *slot); > + TupleTableSlot *slot, > + int *partition_index); > static ResultRelInfo *getTargetResultRelInfo(ModifyTableState *node); > static void ExecSetupChildParentMapForTcs(ModifyTableState *mtstate); > static void ExecSetupChildParentMapForSubplan(ModifyTableState *mtstate); > @@ -264,6 +266,7 @@ ExecInsert(ModifyTableState *mtstate, > TupleTableSlot *slot, > TupleTableSlot *planSlot, > EState *estate, > + int partition_index, > bool canSetTag) > { > HeapTuple tuple; > > If we move the list of arbiter indexes and the tuple desc to ResultRelInfo, > as > suggested above, I think we can avoid making any API changes to > ExecPrepareTupleRouting and ExecInsert. Those API changes are no longer part of the patch. Attached please find an updated version. Thanks, Amit
From edf4d0d6081a00f9eef5d6e8fae1db56ecc0fdeb Mon Sep 17 00:00:00 2001 From: amit <amitlangot...@gmail.com> Date: Tue, 20 Mar 2018 10:09:38 +0900 Subject: [PATCH v8] Fix ON CONFLICT to work with partitioned tables Author: Amit Langote, Alvaro Herrera, Etsuro Fujita --- doc/src/sgml/ddl.sgml | 15 -- doc/src/sgml/ref/insert.sgml | 8 + src/backend/catalog/partition.c | 83 +++++++-- src/backend/executor/execMain.c | 5 + src/backend/executor/execPartition.c | 253 ++++++++++++++++++++++++-- src/backend/executor/nodeModifyTable.c | 65 ++++++- src/backend/parser/analyze.c | 7 - src/include/catalog/partition.h | 1 + src/include/nodes/execnodes.h | 6 + src/test/regress/expected/insert_conflict.out | 73 ++++++-- src/test/regress/expected/triggers.out | 33 ++++ src/test/regress/sql/insert_conflict.sql | 64 ++++++- src/test/regress/sql/triggers.sql | 33 ++++ 13 files changed, 569 insertions(+), 77 deletions(-) diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml index 3a54ba9d5a..8805b88d82 100644 --- a/doc/src/sgml/ddl.sgml +++ b/doc/src/sgml/ddl.sgml @@ -3324,21 +3324,6 @@ ALTER TABLE measurement ATTACH PARTITION measurement_y2008m02 <listitem> <para> - Using the <literal>ON CONFLICT</literal> clause with partitioned tables - will cause an error if the conflict target is specified (see - <xref linkend="sql-on-conflict" /> for more details on how the clause - works). Therefore, it is not possible to specify - <literal>DO UPDATE</literal> as the alternative action, because - specifying the conflict target is mandatory in that case. On the other - hand, specifying <literal>DO NOTHING</literal> as the alternative action - works fine provided the conflict target is not specified. In that case, - unique constraints (or exclusion constraints) of the individual leaf - partitions are considered. - </para> - </listitem> - - <listitem> - <para> When an <command>UPDATE</command> causes a row to move from one partition to another, there is a chance that another concurrent <command>UPDATE</command> or <command>DELETE</command> misses this row. diff --git a/doc/src/sgml/ref/insert.sgml b/doc/src/sgml/ref/insert.sgml index 134092fa9c..62e142fd8e 100644 --- a/doc/src/sgml/ref/insert.sgml +++ b/doc/src/sgml/ref/insert.sgml @@ -518,6 +518,14 @@ INSERT INTO <replaceable class="parameter">table_name</replaceable> [ AS <replac not duplicate each other in terms of attributes constrained by an arbiter index or constraint. </para> + + <para> + Note that it is currently not supported for the + <literal>ON CONFLICT DO UPDATE</literal> clause of an + <command>INSERT</command> applied to a partitioned table to update the + partition key of a conflicting row such that it requires the row be moved + to a new partition. + </para> <tip> <para> It is often preferable to use unique index inference rather than diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c index 53855f5088..bfe559490e 100644 --- a/src/backend/catalog/partition.c +++ b/src/backend/catalog/partition.c @@ -192,6 +192,7 @@ static int get_partition_bound_num_indexes(PartitionBoundInfo b); static int get_greatest_modulus(PartitionBoundInfo b); static uint64 compute_hash_value(int partnatts, FmgrInfo *partsupfunc, Datum *values, bool *isnull); +static Oid get_partition_parent_recurse(Relation inhRel, Oid relid, bool getroot); /* * RelationBuildPartitionDesc @@ -1377,6 +1378,7 @@ check_default_allows_bound(Relation parent, Relation default_rel, /* * get_partition_parent + * Obtain direct parent of given relation * * Returns inheritance parent of a partition by scanning pg_inherits * @@ -1387,14 +1389,59 @@ check_default_allows_bound(Relation parent, Relation default_rel, Oid get_partition_parent(Oid relid) { - Form_pg_inherits form; - Relation catalogRelation; + Relation inhRel; + Oid parentOid; + + inhRel = heap_open(InheritsRelationId, AccessShareLock); + + parentOid = get_partition_parent_recurse(inhRel, relid, false); + if (parentOid == InvalidOid) + elog(ERROR, "could not find parent of relation %u", relid); + + heap_close(inhRel, AccessShareLock); + + return parentOid; +} + +/* + * get_partition_parent + * Obtain topmost ancestor of given relation + * + * Returns the topmost parent inheritance parent of a partition by scanning + * pg_inherits + * + * Note: Because this function assumes that the relation whose OID is passed + * as an argument will have precisely one parent, it should only be called + * when it is known that the relation is a partition. + */ +Oid +get_partition_root_parent(Oid relid) +{ + Relation inhRel; + Oid parentOid; + + inhRel = heap_open(InheritsRelationId, AccessShareLock); + + parentOid = get_partition_parent_recurse(inhRel, relid, true); + if (parentOid == InvalidOid) + elog(ERROR, "could not find root parent of relation %u", relid); + + heap_close(inhRel, AccessShareLock); + + return parentOid; +} + +/* + * get_partition_parent_recurse + * Recursive part of get_partition_parent + */ +static Oid +get_partition_parent_recurse(Relation inhRel, Oid relid, bool getroot) +{ SysScanDesc scan; ScanKeyData key[2]; HeapTuple tuple; - Oid result; - - catalogRelation = heap_open(InheritsRelationId, AccessShareLock); + Oid result = InvalidOid; ScanKeyInit(&key[0], Anum_pg_inherits_inhrelid, @@ -1405,18 +1452,26 @@ get_partition_parent(Oid relid) BTEqualStrategyNumber, F_INT4EQ, Int32GetDatum(1)); - scan = systable_beginscan(catalogRelation, InheritsRelidSeqnoIndexId, true, + /* Obtain the direct parent, and release resources before recursing */ + scan = systable_beginscan(inhRel, InheritsRelidSeqnoIndexId, true, NULL, 2, key); - tuple = systable_getnext(scan); - if (!HeapTupleIsValid(tuple)) - elog(ERROR, "could not find tuple for parent of relation %u", relid); - - form = (Form_pg_inherits) GETSTRUCT(tuple); - result = form->inhparent; - + if (HeapTupleIsValid(tuple)) + result = ((Form_pg_inherits) GETSTRUCT(tuple))->inhparent; systable_endscan(scan); - heap_close(catalogRelation, AccessShareLock); + + /* + * If we were asked to recurse, do so now. Except that if we didn't get a + * valid parent, then the 'relid' argument was already the topmost parent, + * so return that. + */ + if (getroot) + { + if (OidIsValid(result)) + return get_partition_parent_recurse(inhRel, result, getroot); + else + return relid; + } return result; } diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c index 91ba939bdc..5439c44770 100644 --- a/src/backend/executor/execMain.c +++ b/src/backend/executor/execMain.c @@ -1341,11 +1341,16 @@ InitResultRelInfo(ResultRelInfo *resultRelInfo, resultRelInfo->ri_FdwRoutine = GetFdwRoutineForRelation(resultRelationDesc, true); else resultRelInfo->ri_FdwRoutine = NULL; + + /* The following fields are set later if needed */ resultRelInfo->ri_FdwState = NULL; resultRelInfo->ri_usesFdwDirectModify = false; resultRelInfo->ri_ConstraintExprs = NULL; resultRelInfo->ri_junkFilter = NULL; resultRelInfo->ri_projectReturning = NULL; + resultRelInfo->ri_onConflictArbiterIndexes = NIL; + resultRelInfo->ri_onConflictSetProj = NULL; + resultRelInfo->ri_onConflictSetWhere = NULL; /* * Partition constraint, which also includes the partition constraint of diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c index ce9a4e16cf..69efb13c4f 100644 --- a/src/backend/executor/execPartition.c +++ b/src/backend/executor/execPartition.c @@ -15,10 +15,12 @@ #include "postgres.h" #include "catalog/pg_inherits_fn.h" +#include "catalog/pg_type.h" #include "executor/execPartition.h" #include "executor/executor.h" #include "mb/pg_wchar.h" #include "miscadmin.h" +#include "nodes/makefuncs.h" #include "utils/lsyscache.h" #include "utils/rls.h" #include "utils/ruleutils.h" @@ -36,6 +38,7 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel, Datum *values, bool *isnull, int maxfieldlen); +static List *adjust_partition_tlist(List *tlist, TupleConversionMap *map); /* * ExecSetupPartitionTupleRouting - sets up information needed during @@ -64,6 +67,8 @@ ExecSetupPartitionTupleRouting(ModifyTableState *mtstate, Relation rel) int num_update_rri = 0, update_rri_index = 0; PartitionTupleRouting *proute; + int nparts; + ModifyTable *node = mtstate ? (ModifyTable *) mtstate->ps.plan : NULL; /* * Get the information about the partition tree after locking all the @@ -74,20 +79,16 @@ ExecSetupPartitionTupleRouting(ModifyTableState *mtstate, Relation rel) proute->partition_dispatch_info = RelationGetPartitionDispatchInfo(rel, &proute->num_dispatch, &leaf_parts); - proute->num_partitions = list_length(leaf_parts); - proute->partitions = (ResultRelInfo **) palloc(proute->num_partitions * - sizeof(ResultRelInfo *)); + proute->num_partitions = nparts = list_length(leaf_parts); + proute->partitions = + (ResultRelInfo **) palloc(nparts * sizeof(ResultRelInfo *)); proute->parent_child_tupconv_maps = - (TupleConversionMap **) palloc0(proute->num_partitions * - sizeof(TupleConversionMap *)); - proute->partition_oids = (Oid *) palloc(proute->num_partitions * - sizeof(Oid)); + (TupleConversionMap **) palloc0(nparts * sizeof(TupleConversionMap *)); + proute->partition_oids = (Oid *) palloc(nparts * sizeof(Oid)); /* Set up details specific to the type of tuple routing we are doing. */ - if (mtstate && mtstate->operation == CMD_UPDATE) + if (node && node->operation == CMD_UPDATE) { - ModifyTable *node = (ModifyTable *) mtstate->ps.plan; - update_rri = mtstate->resultRelInfo; num_update_rri = list_length(node->plans); proute->subplan_partition_offsets = @@ -475,9 +476,6 @@ ExecInitPartitionInfo(ModifyTableState *mtstate, &mtstate->ps, RelationGetDescr(partrel)); } - Assert(proute->partitions[partidx] == NULL); - proute->partitions[partidx] = leaf_part_rri; - /* * Save a tuple conversion map to convert a tuple routed to this partition * from the parent's type to the partition's. @@ -487,6 +485,144 @@ ExecInitPartitionInfo(ModifyTableState *mtstate, RelationGetDescr(partrel), gettext_noop("could not convert row type")); + /* + * Initialize information about this partition that's needed to handle + * the ON CONFLICT clause. + */ + if (node && node->onConflictAction != ONCONFLICT_NONE) + { + TupleConversionMap *map = proute->parent_child_tupconv_maps[partidx]; + int firstVarno = mtstate->resultRelInfo[0].ri_RangeTableIndex; + Relation firstResultRel = mtstate->resultRelInfo[0].ri_RelationDesc; + TupleDesc partrelDesc = RelationGetDescr(partrel); + ExprContext *econtext = mtstate->ps.ps_ExprContext; + ListCell *lc; + List *arbiterIndexes = NIL; + + /* Generate a list of arbiter indexes for the partition. */ + foreach(lc, resultRelInfo->ri_onConflictArbiterIndexes) + { + Oid parentArbiterIndexOid = lfirst_oid(lc); + int i; + + /* + * Find parentArbiterIndexOid's child in this partition and add it + * to my_arbiterindexes. + */ + for (i = 0; i < leaf_part_rri->ri_NumIndices; i++) + { + Relation index = leaf_part_rri->ri_IndexRelationDescs[i]; + Oid indexOid = RelationGetRelid(index); + + if (parentArbiterIndexOid == + get_partition_root_parent(indexOid)) + arbiterIndexes = lappend_oid(arbiterIndexes, indexOid); + } + } + leaf_part_rri->ri_onConflictArbiterIndexes = arbiterIndexes; + + if (node->onConflictAction == ONCONFLICT_UPDATE) + { + List *onconflset; + TupleDesc tupDesc; + + Assert(node->onConflictSet != NIL); + + /* + * If partition's tuple descriptor differs from the root parent, + * we need to adjust the onConflictSet target list to account for + * differences in attribute numbers. + */ + if (map != NULL) + { + /* + * First convert Vars to contain partition's atttribute + * numbers. + */ + + /* + * Convert Vars referencing EXCLUDED pseudo-relation. + * + * Note that node->onConflictSet itself remains unmodified, + * because a copy is made before changing any nodes. + */ + onconflset = map_partition_varattnos(node->onConflictSet, + INNER_VAR, + partrel, + firstResultRel, NULL); + /* Convert Vars referencing main target relation. */ + onconflset = map_partition_varattnos(onconflset, + firstVarno, + partrel, + firstResultRel, NULL); + + /* + * The original list wouldn't contain entries for the + * partition's dropped attributes, which we must be accounted + * for because targetlist must have all the attributes of the + * underlying table including the dropped ones. Fix that and + * reorder target list entries if their resnos change as a + * result of the adjustment. + */ + onconflset = adjust_partition_tlist(onconflset, map); + } + else + /* Just reuse the original one. */ + onconflset = node->onConflictSet; + + /* + * We must set mtstate->mt_conflproj's tuple descriptor to this + * before trying to use it for projection. + */ + tupDesc = ExecTypeFromTL(onconflset, partrelDesc->tdhasoid); + leaf_part_rri->ri_onConflictSetProj = + ExecBuildProjectionInfo(onconflset, econtext, + mtstate->mt_conflproj, + &mtstate->ps, partrelDesc); + leaf_part_rri->ri_onConflictSetProjTupDesc = tupDesc; + + if (node->onConflictWhere) + { + if (map != NULL) + { + /* + * Convert the Vars to contain partition's atttribute + * numbers + */ + List *onconflwhere; + + /* + * Convert Vars referencing EXCLUDED pseudo-relation. + * + * Note that node->onConflictWhere itself remains + * unmodified, because a copy is made before changing any + * nodes. + */ + onconflwhere = map_partition_varattnos((List *) + node->onConflictWhere, + INNER_VAR, + partrel, + firstResultRel, NULL); + /* Convert Vars referencing main target relation. */ + onconflwhere = map_partition_varattnos((List *) + onconflwhere, + firstVarno, + partrel, + firstResultRel, NULL); + leaf_part_rri->ri_onConflictSetWhere = + ExecInitQual(onconflwhere, &mtstate->ps); + } + else + /* Just reuse the original one. */ + leaf_part_rri->ri_onConflictSetWhere = + resultRelInfo->ri_onConflictSetWhere; + } + } + } + + Assert(proute->partitions[partidx] == NULL); + proute->partitions[partidx] = leaf_part_rri; + MemoryContextSwitchTo(oldContext); return leaf_part_rri; @@ -946,3 +1082,94 @@ ExecBuildSlotPartitionKeyDescription(Relation rel, return buf.data; } + +/* + * adjust_partition_tlist + * Adjust the targetlist entries for a given partition to account for + * attribute differences between parent and the partition + * + * The expressions have already been fixed, but we have to make sure that the + * target resnos match the partition's attribute numbers. This results in + * generating a copy of the original target list in which the entries appear + * in sorted order of resno, including both the existing entries (that may + * have their resno changed in-place) and the newly added entries. + * + * Scribbles on the input tlist, so callers must make sure to make a copy + * before passing it to us. + */ +static List * +adjust_partition_tlist(List *tlist, TupleConversionMap *map) +{ + List *new_tlist = NIL; + TupleDesc tupdesc = map->outdesc; + AttrNumber *attrMap = map->attrMap; + int numattrs = tupdesc->natts; + int attrno; + + for (attrno = 1; attrno <= numattrs; attrno++) + { + Form_pg_attribute att_tup = TupleDescAttr(tupdesc, attrno - 1); + TargetEntry *tle; + + if (attrMap[attrno - 1] != 0) + { + /* + * Use the existing entry from the parent, but make sure to + * update the resno to match the partition's attno. + */ + Assert(!att_tup->attisdropped); + + /* Get the corresponding tlist entry from the given tlist */ + tle = (TargetEntry *) list_nth(tlist, attrMap[attrno - 1] - 1); + + /* Get the resno right */ + if (tle->resno != attrno) + tle->resno = attrno; + } + else + { + /* + * This corresponds to a dropped attribute in the partition, for + * which we enerate a dummy entry with resno matching the + * partition's attno. + */ + Node *expr; + + Assert(att_tup->attisdropped); + + /* Insert NULL for dropped column */ + expr = (Node *) makeConst(INT4OID, + -1, + InvalidOid, + sizeof(int32), + (Datum) 0, + true, /* isnull */ + true /* byval */ ); + + tle = makeTargetEntry((Expr *) expr, + attrno, + pstrdup(NameStr(att_tup->attname)), + false); + } + + new_tlist = lappend(new_tlist, tle); + } + + /* Sanity check on the order of entries in the new tlist. */ +#ifdef USE_ASSERT_CHECKING + { + TargetEntry *prev = NULL; + ListCell *lc; + + foreach(lc, new_tlist) + { + TargetEntry *cur = lfirst(lc); + + Assert(prev == NULL || cur->resno > prev->resno); + prev = cur; + } + } +#endif + + return new_tlist; +} diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c index 4fa2d7265f..d205cebe0f 100644 --- a/src/backend/executor/nodeModifyTable.c +++ b/src/backend/executor/nodeModifyTable.c @@ -422,7 +422,7 @@ ExecInsert(ModifyTableState *mtstate, bool specConflict; List *arbiterIndexes; - arbiterIndexes = node->arbiterIndexes; + arbiterIndexes = resultRelInfo->ri_onConflictArbiterIndexes; /* * Do a non-conclusive check for conflicts first. @@ -1056,6 +1056,18 @@ lreplace:; TupleConversionMap *tupconv_map; /* + * Disallow an INSERT ON CONFLICT DO UPDATE that causes the + * original row to migrate to a different partition. Maybe this + * can be implemented some day, but it seems a fringe feature with + * little redeeming value. + */ + if (((ModifyTable *) mtstate->ps.plan)->onConflictAction == ONCONFLICT_UPDATE) + ereport(ERROR, + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), + errmsg("invalid ON UPDATE specification"), + errdetail("The result tuple would appear in a different partition than the original tuple."))); + + /* * When an UPDATE is run on a leaf partition, we will not have * partition tuple routing set up. In that case, fail with * partition constraint violation error. @@ -1639,6 +1651,7 @@ ExecPrepareTupleRouting(ModifyTableState *mtstate, ResultRelInfo *targetRelInfo, TupleTableSlot *slot) { + ModifyTable *node; int partidx; ResultRelInfo *partrel; HeapTuple tuple; @@ -1720,6 +1733,19 @@ ExecPrepareTupleRouting(ModifyTableState *mtstate, proute->partition_tuple_slot, &slot); + /* Initialize information needed to handle ON CONFLICT DO UPDATE. */ + Assert(mtstate != NULL); + node = (ModifyTable *) mtstate->ps.plan; + if (node->onConflictAction == ONCONFLICT_UPDATE) + { + Assert(mtstate->mt_existing != NULL); + ExecSetSlotDescriptor(mtstate->mt_existing, + RelationGetDescr(partrel->ri_RelationDesc)); + Assert(mtstate->mt_conflproj != NULL); + ExecSetSlotDescriptor(mtstate->mt_conflproj, + partrel->ri_onConflictSetProjTupDesc); + } + return slot; } @@ -2347,11 +2373,15 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags) mtstate->ps.ps_ExprContext = NULL; } + /* Set the list of arbiter indexes if needed for ON CONFLICT */ + resultRelInfo = mtstate->resultRelInfo; + if (node->onConflictAction != ONCONFLICT_NONE) + resultRelInfo->ri_onConflictArbiterIndexes = node->arbiterIndexes; + /* * If needed, Initialize target list, projection and qual for ON CONFLICT * DO UPDATE. */ - resultRelInfo = mtstate->resultRelInfo; if (node->onConflictAction == ONCONFLICT_UPDATE) { ExprContext *econtext; @@ -2368,9 +2398,18 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags) econtext = mtstate->ps.ps_ExprContext; relationDesc = resultRelInfo->ri_RelationDesc->rd_att; - /* initialize slot for the existing tuple */ - mtstate->mt_existing = - ExecInitExtraTupleSlot(mtstate->ps.state, relationDesc); + /* + * Initialize slot for the existing tuple. If we'll be performing + * tuple routing, the tuple descriptor to use for this will be + * determined based on which relation the update is actually applied + * to, so we don't set its tuple descriptor here. + */ + if (mtstate->mt_partition_tuple_routing == NULL) + mtstate->mt_existing = + ExecInitExtraTupleSlot(mtstate->ps.state, relationDesc); + else + mtstate->mt_existing = + ExecInitExtraTupleSlot(mtstate->ps.state, NULL); /* carried forward solely for the benefit of explain */ mtstate->mt_excludedtlist = node->exclRelTlist; @@ -2378,8 +2417,20 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags) /* create target slot for UPDATE SET projection */ tupDesc = ExecTypeFromTL((List *) node->onConflictSet, relationDesc->tdhasoid); - mtstate->mt_conflproj = - ExecInitExtraTupleSlot(mtstate->ps.state, tupDesc); + + /* + * Just like the "existing tuple" slot, we leave this slot's + * tuple descriptor set to NULL in the tuple routing case. + */ + if (mtstate->mt_partition_tuple_routing == NULL) + { + mtstate->mt_conflproj = + ExecInitExtraTupleSlot(mtstate->ps.state, tupDesc); + resultRelInfo->ri_onConflictSetProjTupDesc = tupDesc; + } + else + mtstate->mt_conflproj = + ExecInitExtraTupleSlot(mtstate->ps.state, NULL); /* build UPDATE SET projection state */ resultRelInfo->ri_onConflictSetProj = diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c index cf1a34e41a..a4b5aaef44 100644 --- a/src/backend/parser/analyze.c +++ b/src/backend/parser/analyze.c @@ -1026,13 +1026,6 @@ transformOnConflictClause(ParseState *pstate, TargetEntry *te; int attno; - if (targetrel->rd_partdesc) - ereport(ERROR, - (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), - errmsg("%s cannot be applied to partitioned table \"%s\"", - "ON CONFLICT DO UPDATE", - RelationGetRelationName(targetrel)))); - /* * All INSERT expressions have been parsed, get ready for potentially * existing SET statements that need to be processed like an UPDATE. diff --git a/src/include/catalog/partition.h b/src/include/catalog/partition.h index 2faf0ca26e..287642b01b 100644 --- a/src/include/catalog/partition.h +++ b/src/include/catalog/partition.h @@ -52,6 +52,7 @@ extern PartitionBoundInfo partition_bounds_copy(PartitionBoundInfo src, extern void check_new_partition_bound(char *relname, Relation parent, PartitionBoundSpec *spec); extern Oid get_partition_parent(Oid relid); +extern Oid get_partition_root_parent(Oid relid); extern List *get_qual_from_partbound(Relation rel, Relation parent, PartitionBoundSpec *spec); extern List *map_partition_varattnos(List *expr, int fromrel_varno, diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h index d9e591802f..4836a1e0cf 100644 --- a/src/include/nodes/execnodes.h +++ b/src/include/nodes/execnodes.h @@ -412,9 +412,15 @@ typedef struct ResultRelInfo /* for computing a RETURNING list */ ProjectionInfo *ri_projectReturning; + /* list of arbiter indexes to use to check conflicts */ + List *ri_onConflictArbiterIndexes; + /* for computing ON CONFLICT DO UPDATE SET */ ProjectionInfo *ri_onConflictSetProj; + /* TupleDesc describing the result of the above */ + TupleDesc ri_onConflictSetProjTupDesc; + /* list of ON CONFLICT DO UPDATE exprs (qual) */ ExprState *ri_onConflictSetWhere; diff --git a/src/test/regress/expected/insert_conflict.out b/src/test/regress/expected/insert_conflict.out index 2650faedee..a9677f06e6 100644 --- a/src/test/regress/expected/insert_conflict.out +++ b/src/test/regress/expected/insert_conflict.out @@ -786,16 +786,67 @@ select * from selfconflict; (3 rows) drop table selfconflict; --- check that the following works: --- insert into partitioned_table on conflict do nothing -create table parted_conflict_test (a int, b char) partition by list (a); -create table parted_conflict_test_1 partition of parted_conflict_test (b unique) for values in (1); +-- check ON CONFLICT handling with partitioned tables +create table parted_conflict_test (a int unique, b char) partition by list (a); +create table parted_conflict_test_1 partition of parted_conflict_test (b unique) for values in (1, 2); +-- no indexes required here insert into parted_conflict_test values (1, 'a') on conflict do nothing; -insert into parted_conflict_test values (1, 'a') on conflict do nothing; --- however, on conflict do update is not supported yet -insert into parted_conflict_test values (1) on conflict (b) do update set a = excluded.a; -ERROR: ON CONFLICT DO UPDATE cannot be applied to partitioned table "parted_conflict_test" --- but it works OK if we target the partition directly -insert into parted_conflict_test_1 values (1) on conflict (b) do -update set a = excluded.a; +-- index on a required, which does exist in parent +insert into parted_conflict_test values (1, 'a') on conflict (a) do nothing; +insert into parted_conflict_test values (1, 'a') on conflict (a) do update set b = excluded.b; +-- targeting partition directly will work +insert into parted_conflict_test_1 values (1, 'a') on conflict (a) do nothing; +insert into parted_conflict_test_1 values (1, 'b') on conflict (a) do update set b = excluded.b; +-- index on b required, which doesn't exist in parent +insert into parted_conflict_test values (2, 'b') on conflict (b) do update set a = excluded.a; +ERROR: there is no unique or exclusion constraint matching the ON CONFLICT specification +-- targeting partition directly will work +insert into parted_conflict_test_1 values (2, 'b') on conflict (b) do update set a = excluded.a; +-- should see (2, 'b') +select * from parted_conflict_test order by a; + a | b +---+--- + 2 | b +(1 row) + +-- now check that DO UPDATE works correctly for target partition with +-- different attribute numbers +create table parted_conflict_test_2 (b char, a int unique); +alter table parted_conflict_test attach partition parted_conflict_test_2 for values in (3); +truncate parted_conflict_test; +insert into parted_conflict_test values (3, 'a') on conflict (a) do update set b = excluded.b; +insert into parted_conflict_test values (3, 'b') on conflict (a) do update set b = excluded.b; +-- should see (3, 'b') +select * from parted_conflict_test order by a; + a | b +---+--- + 3 | b +(1 row) + +-- case where parent will have a dropped column, but the partition won't +alter table parted_conflict_test drop b, add b char; +create table parted_conflict_test_3 partition of parted_conflict_test for values in (4); +truncate parted_conflict_test; +insert into parted_conflict_test (a, b) values (4, 'a') on conflict (a) do update set b = excluded.b; +insert into parted_conflict_test (a, b) values (4, 'b') on conflict (a) do update set b = excluded.b where parted_conflict_test.b = 'a'; +-- should see (4, 'b') +select * from parted_conflict_test order by a; + a | b +---+--- + 4 | b +(1 row) + +-- case with multi-level partitioning +create table parted_conflict_test_4 partition of parted_conflict_test for values in (5) partition by list (a); +create table parted_conflict_test_4_1 partition of parted_conflict_test_4 for values in (5); +truncate parted_conflict_test; +insert into parted_conflict_test (a, b) values (5, 'a') on conflict (a) do update set b = excluded.b; +insert into parted_conflict_test (a, b) values (5, 'b') on conflict (a) do update set b = excluded.b where parted_conflict_test.b = 'a'; +-- should see (5, 'b') +select * from parted_conflict_test order by a; + a | b +---+--- + 5 | b +(1 row) + drop table parted_conflict_test; diff --git a/src/test/regress/expected/triggers.out b/src/test/regress/expected/triggers.out index 99be9ac6e9..f53ac6bdf1 100644 --- a/src/test/regress/expected/triggers.out +++ b/src/test/regress/expected/triggers.out @@ -2328,6 +2328,39 @@ insert into my_table values (3, 'CCC'), (4, 'DDD') NOTICE: trigger = my_table_update_trig, old table = (3,CCC), (4,DDD), new table = (3,CCC:CCC), (4,DDD:DDD) NOTICE: trigger = my_table_insert_trig, new table = <NULL> -- +-- now using a partitioned table +-- +create table iocdu_tt_parted (a int primary key, b text) partition by list (a); +create table iocdu_tt_parted1 partition of iocdu_tt_parted for values in (1); +create table iocdu_tt_parted2 partition of iocdu_tt_parted for values in (2); +create table iocdu_tt_parted3 partition of iocdu_tt_parted for values in (3); +create table iocdu_tt_parted4 partition of iocdu_tt_parted for values in (4); +create trigger iocdu_tt_parted_insert_trig + after insert on iocdu_tt_parted referencing new table as new_table + for each statement execute procedure dump_insert(); +create trigger iocdu_tt_parted_update_trig + after update on iocdu_tt_parted referencing old table as old_table new table as new_table + for each statement execute procedure dump_update(); +-- inserts only +insert into iocdu_tt_parted values (1, 'AAA'), (2, 'BBB') + on conflict (a) do + update set b = iocdu_tt_parted.b || ':' || excluded.b; +NOTICE: trigger = iocdu_tt_parted_update_trig, old table = <NULL>, new table = <NULL> +NOTICE: trigger = iocdu_tt_parted_insert_trig, new table = (1,AAA), (2,BBB) +-- mixture of inserts and updates +insert into iocdu_tt_parted values (1, 'AAA'), (2, 'BBB'), (3, 'CCC'), (4, 'DDD') + on conflict (a) do + update set b = iocdu_tt_parted.b || ':' || excluded.b; +NOTICE: trigger = iocdu_tt_parted_update_trig, old table = (1,AAA), (2,BBB), new table = (1,AAA:AAA), (2,BBB:BBB) +NOTICE: trigger = iocdu_tt_parted_insert_trig, new table = (3,CCC), (4,DDD) +-- updates only +insert into iocdu_tt_parted values (3, 'CCC'), (4, 'DDD') + on conflict (a) do + update set b = iocdu_tt_parted.b || ':' || excluded.b; +NOTICE: trigger = iocdu_tt_parted_update_trig, old table = (3,CCC), (4,DDD), new table = (3,CCC:CCC), (4,DDD:DDD) +NOTICE: trigger = iocdu_tt_parted_insert_trig, new table = <NULL> +drop table iocdu_tt_parted; +-- -- Verify that you can't create a trigger with transition tables for -- more than one event. -- diff --git a/src/test/regress/sql/insert_conflict.sql b/src/test/regress/sql/insert_conflict.sql index 32c647e3f8..73122479a3 100644 --- a/src/test/regress/sql/insert_conflict.sql +++ b/src/test/regress/sql/insert_conflict.sql @@ -472,15 +472,59 @@ select * from selfconflict; drop table selfconflict; --- check that the following works: --- insert into partitioned_table on conflict do nothing -create table parted_conflict_test (a int, b char) partition by list (a); -create table parted_conflict_test_1 partition of parted_conflict_test (b unique) for values in (1); +-- check ON CONFLICT handling with partitioned tables +create table parted_conflict_test (a int unique, b char) partition by list (a); +create table parted_conflict_test_1 partition of parted_conflict_test (b unique) for values in (1, 2); + +-- no indexes required here insert into parted_conflict_test values (1, 'a') on conflict do nothing; -insert into parted_conflict_test values (1, 'a') on conflict do nothing; --- however, on conflict do update is not supported yet -insert into parted_conflict_test values (1) on conflict (b) do update set a = excluded.a; --- but it works OK if we target the partition directly -insert into parted_conflict_test_1 values (1) on conflict (b) do -update set a = excluded.a; + +-- index on a required, which does exist in parent +insert into parted_conflict_test values (1, 'a') on conflict (a) do nothing; +insert into parted_conflict_test values (1, 'a') on conflict (a) do update set b = excluded.b; + +-- targeting partition directly will work +insert into parted_conflict_test_1 values (1, 'a') on conflict (a) do nothing; +insert into parted_conflict_test_1 values (1, 'b') on conflict (a) do update set b = excluded.b; + +-- index on b required, which doesn't exist in parent +insert into parted_conflict_test values (2, 'b') on conflict (b) do update set a = excluded.a; + +-- targeting partition directly will work +insert into parted_conflict_test_1 values (2, 'b') on conflict (b) do update set a = excluded.a; + +-- should see (2, 'b') +select * from parted_conflict_test order by a; + +-- now check that DO UPDATE works correctly for target partition with +-- different attribute numbers +create table parted_conflict_test_2 (b char, a int unique); +alter table parted_conflict_test attach partition parted_conflict_test_2 for values in (3); +truncate parted_conflict_test; +insert into parted_conflict_test values (3, 'a') on conflict (a) do update set b = excluded.b; +insert into parted_conflict_test values (3, 'b') on conflict (a) do update set b = excluded.b; + +-- should see (3, 'b') +select * from parted_conflict_test order by a; + +-- case where parent will have a dropped column, but the partition won't +alter table parted_conflict_test drop b, add b char; +create table parted_conflict_test_3 partition of parted_conflict_test for values in (4); +truncate parted_conflict_test; +insert into parted_conflict_test (a, b) values (4, 'a') on conflict (a) do update set b = excluded.b; +insert into parted_conflict_test (a, b) values (4, 'b') on conflict (a) do update set b = excluded.b where parted_conflict_test.b = 'a'; + +-- should see (4, 'b') +select * from parted_conflict_test order by a; + +-- case with multi-level partitioning +create table parted_conflict_test_4 partition of parted_conflict_test for values in (5) partition by list (a); +create table parted_conflict_test_4_1 partition of parted_conflict_test_4 for values in (5); +truncate parted_conflict_test; +insert into parted_conflict_test (a, b) values (5, 'a') on conflict (a) do update set b = excluded.b; +insert into parted_conflict_test (a, b) values (5, 'b') on conflict (a) do update set b = excluded.b where parted_conflict_test.b = 'a'; + +-- should see (5, 'b') +select * from parted_conflict_test order by a; + drop table parted_conflict_test; diff --git a/src/test/regress/sql/triggers.sql b/src/test/regress/sql/triggers.sql index 3354f4899f..3773c6bc98 100644 --- a/src/test/regress/sql/triggers.sql +++ b/src/test/regress/sql/triggers.sql @@ -1773,6 +1773,39 @@ insert into my_table values (3, 'CCC'), (4, 'DDD') update set b = my_table.b || ':' || excluded.b; -- +-- now using a partitioned table +-- + +create table iocdu_tt_parted (a int primary key, b text) partition by list (a); +create table iocdu_tt_parted1 partition of iocdu_tt_parted for values in (1); +create table iocdu_tt_parted2 partition of iocdu_tt_parted for values in (2); +create table iocdu_tt_parted3 partition of iocdu_tt_parted for values in (3); +create table iocdu_tt_parted4 partition of iocdu_tt_parted for values in (4); +create trigger iocdu_tt_parted_insert_trig + after insert on iocdu_tt_parted referencing new table as new_table + for each statement execute procedure dump_insert(); +create trigger iocdu_tt_parted_update_trig + after update on iocdu_tt_parted referencing old table as old_table new table as new_table + for each statement execute procedure dump_update(); + +-- inserts only +insert into iocdu_tt_parted values (1, 'AAA'), (2, 'BBB') + on conflict (a) do + update set b = iocdu_tt_parted.b || ':' || excluded.b; + +-- mixture of inserts and updates +insert into iocdu_tt_parted values (1, 'AAA'), (2, 'BBB'), (3, 'CCC'), (4, 'DDD') + on conflict (a) do + update set b = iocdu_tt_parted.b || ':' || excluded.b; + +-- updates only +insert into iocdu_tt_parted values (3, 'CCC'), (4, 'DDD') + on conflict (a) do + update set b = iocdu_tt_parted.b || ':' || excluded.b; + +drop table iocdu_tt_parted; + +-- -- Verify that you can't create a trigger with transition tables for -- more than one event. -- -- 2.11.0