On Tue, Apr 3, 2018 at 7:48 AM, Andres Freund <and...@anarazel.de> wrote:
> > > @@ -3828,8 +3846,14 @@ struct AfterTriggersTableData > bool before_trig_done; /* did we already queue BS triggers? */ > bool after_trig_done; /* did we already queue AS triggers? */ > AfterTriggerEventList after_trig_events; /* if so, saved list > pointer */ > - Tuplestorestate *old_tuplestore; /* "old" transition table, if any > */ > - Tuplestorestate *new_tuplestore; /* "new" transition table, if any > */ > + /* "old" transition table for UPDATE, if any */ > + Tuplestorestate *old_upd_tuplestore; > + /* "new" transition table for UPDATE, if any */ > + Tuplestorestate *new_upd_tuplestore; > + /* "old" transition table for DELETE, if any */ > + Tuplestorestate *old_del_tuplestore; > + /* "new" transition table INSERT, if any */ > + Tuplestorestate *new_ins_tuplestore; > }; > > A comment somewhere why we have all of these (presumably because they > can now happen in the context of a single statement) would be good. > > Done. > > @@ -5744,12 +5796,28 @@ AfterTriggerSaveEvent(EState *estate, > ResultRelInfo *relinfo, > newtup == NULL)); > > if (oldtup != NULL && > - ((event == TRIGGER_EVENT_DELETE && delete_old_table) || > - (event == TRIGGER_EVENT_UPDATE && update_old_table))) > + (event == TRIGGER_EVENT_DELETE && delete_old_table)) > { > Tuplestorestate *old_tuplestore; > > - old_tuplestore = transition_capture->tcs_privat > e->old_tuplestore; > + old_tuplestore = transition_capture->tcs_privat > e->old_del_tuplestore; > + > + if (map != NULL) > + { > + HeapTuple converted = do_convert_tuple(oldtup, map); > + > + tuplestore_puttuple(old_tuplestore, converted); > + pfree(converted); > + } > + else > + tuplestore_puttuple(old_tuplestore, oldtup); > + } > > Very similar code is now repeated four times. Could you abstract this > into a separate function? > > Ok. I gave it a try, please check. It's definitely a lot lesser code. > > > --- a/src/backend/executor/README > +++ b/src/backend/executor/README > @@ -37,6 +37,16 @@ the plan tree returns the computed tuples to be > updated, plus a "junk" > one. For DELETE, the plan tree need only deliver a CTID column, and the > ModifyTable node visits each of those rows and marks the row deleted. > > +MERGE runs one generic plan that returns candidate target rows. Each row > +consists of a super-row that contains all the columns needed by any of the > +individual actions, plus a CTID and a TABLEOID junk columns. The CTID > column is > > "plus a CTID and a TABLEOID junk columns" sounds a bit awkward? > Changed. > > +required to know if a matching target row was found or not and the > TABLEOID > +column is needed to find the underlying target partition, in case when the > +target table is a partition table. If the CTID column is set we attempt to > +activate WHEN MATCHED actions, or if it is NULL then we will attempt to > > "if it is NULL then" sounds wrong. > Made some adjustments. > > +activate WHEN NOT MATCHED actions. Once we know which action is activated > we > +form the final result row and apply only those changes. > + > XXX a great deal more documentation needs to be written here... > > Are we using tableoids in a similar way in other places? > > AFAIK, no. > > > +/* > + * Given OID of the partition leaf, return the index of the leaf in the > + * partition hierarchy. > + */ > +int > +ExecFindPartitionByOid(PartitionTupleRouting *proute, Oid partoid) > +{ > + int i; > + > + for (i = 0; i < proute->num_partitions; i++) > + { > + if (proute->partition_oids[i] == partoid) > + break; > + } > + > + Assert(i < proute->num_partitions); > + return i; > +} > > Shouldn't we at least warn in a comment that this is O(N)? And document > that it does weird stuff if the OID isn't found? > Yeah, added a comment. Also added a ereport(ERROR) if we do not find the partition. There was already an Assert, but may be ERROR is better. > > Perhaps just introduce a PARTOID syscache? > > Probably as a separate patch. Anything more than a handful partitions is anyways known to be too slow and I doubt this code will add anything material impact to that. > > > diff --git a/src/backend/executor/nodeMerge.c > b/src/backend/executor/nodeMerge.c > new file mode 100644 > index 00000000000..0e0d0795d4d > --- /dev/null > +++ b/src/backend/executor/nodeMerge.c > @@ -0,0 +1,575 @@ > +/*--------------------------------------------------------- > ---------------- > + * > + * nodeMerge.c > + * routines to handle Merge nodes relating to the MERGE command > > Isn't this file misnamed and it should be execMerge.c? The rest of the > node*.c files are for nodes that are invoked via execProcnode / > ExecProcNode(). This isn't an actual executor node. > Makes sense. Done. (Now that the patch is committed, I don't know if there would be a rethink about changing file names. May be not, just raising that concern) > > Might also be worthwhile to move the merge related init stuff here? > Done. > > > What's the reasoning behind making this be an anomaluous type of > executor node? > > Didn't quite get that. I think naming of the file was bad (fixed now), but I think it's a good idea to move the new code in a new file, from maintainability as well as coverage perspective. If you've something very different in mind, can you explain in more details? > > > FWIW, I'd re-order this file so this routine is above > ExecMergeMatched(), ExecMergeNotMatched(), easier to understand. > Done. > > > + /* > + * If there are not WHEN MATCHED actions, we are done. > + */ > + if (mergeMatchedActionStates == NIL) > + return true; > > Maybe I'm confused, but why is mergeMatchedActionStates attached to > per-partition info? How can this differ in number between partitions, > requiring us to re-check it below fetching the partition info above? > > Because each partition may have a columns in different order, dropped attributes etc. So we need to give treatment to the quals/targetlists. See ON CONFLICT DO UPDATE for similar code. > > + /* > + * Check for any concurrent update/delete operation which may have > + * prevented our update/delete. We also check for situations where > we > + * might be trying to update/delete the same tuple twice. > + */ > + if ((action->commandType == CMD_UPDATE && !tuple_updated) || > + (action->commandType == CMD_DELETE && !tuple_deleted)) > + > + { > + switch (hufd.result) > + { > + case HeapTupleMayBeUpdated: > + break; > + case HeapTupleInvisible: > + > + /* > + * This state should never be reached since the > underlying > + * JOIN runs with a MVCC snapshot and should only > return > + * rows visible to us. > + */ > > Given EPQ, that reasoning isn't correct. I think this should still be > unreachable, just not for the reason described here. > > Agree. Updated the comment, but please check if it's satisfactory or you would like to say something more/different. > > > It seems fairly bad architecturally to me that we now have > EvalPlanQual() loops in both this routine *and* > ExecUpdate()/ExecDelete(). > > This was done after review by Peter and I think I like the new way too. Also keeps the regular UPDATE/DELETE code paths least changed and let Merge handle concurrency issues specific to it. > > + > +/* > + * Execute the first qualifying NOT MATCHED action. > + */ > +static void > +ExecMergeNotMatched(ModifyTableState *mtstate, EState *estate, > + TupleTableSlot *slot) > +{ > + PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing; > + ExprContext *econtext = mtstate->ps.ps_ExprContext; > + List *mergeNotMatchedActionStates = NIL; > + ResultRelInfo *resultRelInfo; > + ListCell *l; > + TupleTableSlot *myslot; > + > + /* > + * We are dealing with NOT MATCHED tuple. Since for MERGE, partition > tree > > *the partition tree > Fixed. > > > + * is not expanded for the result relation, we continue to work with > the > + * currently active result relation, which should be of the root of the > + * partition tree. > + */ > + resultRelInfo = mtstate->resultRelInfo; > > "should be"? "is", I hope? Given that it's referencing > mtstate->resultRelInfo which is only set in one place... > > Yeah, fixed. > > > +/* flags for mt_merge_subcommands */ > +#define MERGE_INSERT 0x01 > +#define MERGE_UPDATE 0x02 > +#define MERGE_DELETE 0x04 > > Hm, it seems a bit weird to define these file-local, given there's > another file implementing a good chunk of merge. > Ok. Moved them execMerge.h, which made sense after I moved the initialisation related code to execMerge.c > > + /* > + * Initialize hufdp. Since the caller is only interested in the failure > + * status, initialize with the state that is used to indicate > successful > + * operation. > + */ > + if (hufdp) > + hufdp->result = HeapTupleMayBeUpdated; > + > > This signals for me that the addition of the result field to HUFD wasn't > architecturally the right thing. HUFD is supposed to be supposed to be > returned by heap_update(), reusing and setting it from other places > seems like a layering violation to me. > > I am not sure I agree. Sure we can keep adding more parameters to ExecUpdate/ExecDelete and such routines, but I thought passing back all information via a single struct makes more sense. > > > diff --git a/src/backend/optimizer/plan/setrefs.c > b/src/backend/optimizer/plan/setrefs.c > index 69dd327f0c9..cd540a0df5b 100644 > --- a/src/backend/optimizer/plan/setrefs.c > +++ b/src/backend/optimizer/plan/setrefs.c > @@ -851,6 +851,60 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int > rtoffset) > fix_scan_list(root, splan->exclRelTlist, rtoffset); > } > > + /* > + * The MERGE produces the target rows by performing a right > > s/the MERGE/the MERGE statement/? > > Fixed. > > + * join between the target relation and the source relation > + * (which could be a plain relation or a subquery). The > INSERT > + * and UPDATE actions of the MERGE requires access to the > > same. > > Fixed. > > > +opt_and_condition: > > Renaming this to be a bit more merge specific seems like a good idea. > Renamed to opt_merge_when_and_condition > > + > + /* > + * Add a whole-row-Var entry to support references to "source.*". > + */ > + var = makeWholeRowVar(rt_rte, rt_rtindex, 0, false); > + te = makeTargetEntry((Expr *) var, list_length(*mergeSourceTargetList) > + 1, > + NULL, true); > > Why can't this properly dealt with by transformWholeRowRef() etc? > > I just followed ON CONFLICT style. May be there is a better way, but not clear how. > > > + if (selectStmt && (selectStmt->valuesLists == NIL || > + selectStmt->sortClause != NIL || > + selectStmt->limitOffset != NULL || > + selectStmt->limitCount != NULL || > + selectStmt->lockingClause != NIL || > + selectStmt->withClause != NULL)) > + ereport(ERROR, > + (errcode(ERRCODE_SYNTAX_ERROR), > + errmsg("SELECT not allowed in MERGE > INSERT statement"))); > > + if (selectStmt && list_length(selectStmt->valuesLists) > > 1) > + ereport(ERROR, > + (errcode(ERRCODE_SYNTAX_ERROR), > + errmsg("Multiple VALUES clauses not > allowed in MERGE INSERT statement"))); > > Shouldn't this include an error position? > > I will work on this as a separate follow-up patch. I tried adding location to MergeAction, but that alone is probably not sufficient. So it was turning out a slightly bigger patch than I anticipated. > > Why is this, and not building a proper executor node for merge that > knows how to get the tuples, the right approach? We did a rough > equivalent for matview updates, and I think it turned out to be a pretty > bad plan. > > I am still not sure why that would be any better. Can you explain in detail what exactly you've in mind and how's that significantly better than what we have today? > > > + /* > + * XXX MERGE is unsupported in various cases > + */ > + if (!(pstate->p_target_relation->rd_rel->relkind == RELKIND_RELATION > || > + pstate->p_target_relation->rd_rel->relkind == > RELKIND_PARTITIONED_TABLE)) > + ereport(ERROR, > + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), > + errmsg("MERGE is not supported for this relation type"))); > > Shouldn't this report what relation type the error talking about? It's > not going to necessarily be obvious to the user. Also, errposition etc > would be good. > > Hmm, ok. There is getRelationTypeDescription() but otherwise I am surprised that I couldn't find a ready way to get string representation of relation kind. Am I missing something? Of course, we can write for the purpose, but wanted to ensure we are not duplicating something already available. > > > + /* > + * Do basic expression transformation (same as a > ROW() > + * expr, but allow SetToDefault at top level) > + */ > + exprList = transformExpressionList(pstate, > + (List *) > linitial(valuesLists), > + > EXPR_KIND_VALUES_SINGLE, > + true); > + > + /* Prepare row for assignment to target table */ > + exprList = transformInsertRow(pstate, exprList, > + istmt->cols, > + icolumns, attrnos, > + false); > + } > > Can't we handle this with a littlebit less code duplication? > Hmm, yeah, that will be good. But given Tom's suggestions on the other thread, I would like to postpone any refactoring here. > > typedef struct HeapUpdateFailureData > { > + HTSU_Result result; > ItemPointerData ctid; > TransactionId xmax; > CommandId cmax; > + LockTupleMode lockmode; > } HeapUpdateFailureData; > > > These new fields seem not really relateto HUFD, but rather just fields > the merge code should maintain? > As I said, we can possibly track those separately. But they all arise as a result of update/delete/lock failure. So I would prefer to keep them along with other fields such as ctid and xmax/cmax. But if you or others insist, I can move them and pass in/out of various function calls where we need to maintain those. Thanks, Pavan -- Pavan Deolasee http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
0001-Take-care-of-Andres-s-comments.patch
Description: Binary data