Re: [HACKERS] MERGE SQL Statement for PG11

Pavan Deolasee Wed, 04 Apr 2018 23:03:07 -0700

On Tue, Apr 3, 2018 at 7:48 AM, Andres Freund <and...@anarazel.de> wrote:


>
>
> @@ -3828,8 +3846,14 @@ struct AfterTriggersTableData
>     bool        before_trig_done;   /* did we already queue BS triggers? */
>     bool        after_trig_done;    /* did we already queue AS triggers? */
>     AfterTriggerEventList after_trig_events;    /* if so, saved list
> pointer */
> -   Tuplestorestate *old_tuplestore;    /* "old" transition table, if any
> */
> -   Tuplestorestate *new_tuplestore;    /* "new" transition table, if any
> */
> +   /* "old" transition table for UPDATE, if any */
> +   Tuplestorestate *old_upd_tuplestore;
> +   /* "new" transition table for UPDATE, if any */
> +   Tuplestorestate *new_upd_tuplestore;
> +   /* "old" transition table for DELETE, if any */
> +   Tuplestorestate *old_del_tuplestore;
> +   /* "new" transition table INSERT, if any */
> +   Tuplestorestate *new_ins_tuplestore;
>  };
>
> A comment somewhere why we have all of these (presumably because they
> can now happen in the context of a single statement) would be good.
>
>
Done.


>
> @@ -5744,12 +5796,28 @@ AfterTriggerSaveEvent(EState *estate,
> ResultRelInfo *relinfo,
>                  newtup == NULL));
>
>         if (oldtup != NULL &&
> -           ((event == TRIGGER_EVENT_DELETE && delete_old_table) ||
> -            (event == TRIGGER_EVENT_UPDATE && update_old_table)))
> +           (event == TRIGGER_EVENT_DELETE && delete_old_table))
>         {
>             Tuplestorestate *old_tuplestore;
>
> -           old_tuplestore = transition_capture->tcs_privat
> e->old_tuplestore;
> +           old_tuplestore = transition_capture->tcs_privat
> e->old_del_tuplestore;
> +
> +           if (map != NULL)
> +           {
> +               HeapTuple   converted = do_convert_tuple(oldtup, map);
> +
> +               tuplestore_puttuple(old_tuplestore, converted);
> +               pfree(converted);
> +           }
> +           else
> +               tuplestore_puttuple(old_tuplestore, oldtup);
> +       }
>
> Very similar code is now repeated four times. Could you abstract this
> into a separate function?
>
>
Ok. I gave it a try, please check. It's definitely a lot lesser code.


>
>
> --- a/src/backend/executor/README
> +++ b/src/backend/executor/README
> @@ -37,6 +37,16 @@ the plan tree returns the computed tuples to be
> updated, plus a "junk"
>  one.  For DELETE, the plan tree need only deliver a CTID column, and the
>  ModifyTable node visits each of those rows and marks the row deleted.
>
> +MERGE runs one generic plan that returns candidate target rows. Each row
> +consists of a super-row that contains all the columns needed by any of the
> +individual actions, plus a CTID and a TABLEOID junk columns. The CTID
> column is
>
> "plus a CTID and a TABLEOID junk columns" sounds a bit awkward?
>

Changed.


>
> +required to know if a matching target row was found or not and the
> TABLEOID
> +column is needed to find the underlying target partition, in case when the
> +target table is a partition table. If the CTID column is set we attempt to
> +activate WHEN MATCHED actions, or if it is NULL then we will attempt to
>
> "if it is NULL then" sounds wrong.
>

Made some adjustments.


>
> +activate WHEN NOT MATCHED actions. Once we know which action is activated
> we
> +form the final result row and apply only those changes.
> +
>  XXX a great deal more documentation needs to be written here...
>
> Are we using tableoids in a similar way in other places?
>
>
AFAIK, no.


>
>
> +/*
> + * Given OID of the partition leaf, return the index of the leaf in the
> + * partition hierarchy.
> + */
> +int
> +ExecFindPartitionByOid(PartitionTupleRouting *proute, Oid partoid)
> +{
> +   int i;
> +
> +   for (i = 0; i < proute->num_partitions; i++)
> +   {
> +       if (proute->partition_oids[i] == partoid)
> +           break;
> +   }
> +
> +   Assert(i < proute->num_partitions);
> +   return i;
> +}
>
> Shouldn't we at least warn in a comment that this is O(N)? And document
> that it does weird stuff if the OID isn't found?
>

Yeah, added a comment. Also added a ereport(ERROR) if we do not find the
partition. There was already an Assert, but may be ERROR is better.


>
> Perhaps just introduce a PARTOID syscache?
>
>
Probably as a separate patch. Anything more than a handful partitions is
anyways known to be too slow and I doubt this code will add anything
material impact to that.


>
>
> diff --git a/src/backend/executor/nodeMerge.c
> b/src/backend/executor/nodeMerge.c
> new file mode 100644
> index 00000000000..0e0d0795d4d
> --- /dev/null
> +++ b/src/backend/executor/nodeMerge.c
> @@ -0,0 +1,575 @@
> +/*---------------------------------------------------------
> ----------------
> + *
> + * nodeMerge.c
> + *   routines to handle Merge nodes relating to the MERGE command
>
> Isn't this file misnamed and it should be execMerge.c?  The rest of the
> node*.c files are for nodes that are invoked via execProcnode /
> ExecProcNode().  This isn't an actual executor node.
>

Makes sense. Done. (Now that the patch is committed, I don't know if there
would be a rethink about changing file names. May be not, just raising that
concern)


>
> Might also be worthwhile to move the merge related init stuff here?
>


Done.


>
>
> What's the reasoning behind making this be an anomaluous type of
> executor node?
>
>
Didn't quite get that. I think naming of the file was bad (fixed now), but
I think it's a good idea to move the new code in a new file, from
maintainability as well as coverage perspective. If you've something very
different in mind, can you explain in more details?


>
>
> FWIW, I'd re-order this file so this routine is above
> ExecMergeMatched(), ExecMergeNotMatched(), easier to understand.
>

Done.


>
>
> +   /*
> +    * If there are not WHEN MATCHED actions, we are done.
> +    */
> +   if (mergeMatchedActionStates == NIL)
> +       return true;
>
> Maybe I'm confused, but why is mergeMatchedActionStates attached to
> per-partition info? How can this differ in number between partitions,
> requiring us to re-check it below fetching the partition info above?
>
>
Because each partition may have a columns in different order, dropped
attributes etc. So we need to give treatment to the quals/targetlists. See
ON CONFLICT DO UPDATE for similar code.


>
> +       /*
> +        * Check for any concurrent update/delete operation which may have
> +        * prevented our update/delete. We also check for situations where
> we
> +        * might be trying to update/delete the same tuple twice.
> +        */
> +       if ((action->commandType == CMD_UPDATE && !tuple_updated) ||
> +           (action->commandType == CMD_DELETE && !tuple_deleted))
> +
> +       {
> +           switch (hufd.result)
> +           {
> +               case HeapTupleMayBeUpdated:
> +                   break;
> +               case HeapTupleInvisible:
> +
> +                   /*
> +                    * This state should never be reached since the
> underlying
> +                    * JOIN runs with a MVCC snapshot and should only
> return
> +                    * rows visible to us.
> +                    */
>
> Given EPQ, that reasoning isn't correct. I think this should still be
> unreachable, just not for the reason described here.
>
>
Agree. Updated the comment, but please check if it's satisfactory or you
would like to say something more/different.


>
>
> It seems fairly bad architecturally to me that we now have
> EvalPlanQual() loops in both this routine *and*
> ExecUpdate()/ExecDelete().
>
>
This was done after review by Peter and I think I like the new way too.
Also keeps the regular UPDATE/DELETE code paths least changed and let Merge
handle concurrency issues specific to it.


>
> +
> +/*
> + * Execute the first qualifying NOT MATCHED action.
> + */
> +static void
> +ExecMergeNotMatched(ModifyTableState *mtstate, EState *estate,
> +                   TupleTableSlot *slot)
> +{
> +   PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
> +   ExprContext *econtext = mtstate->ps.ps_ExprContext;
> +   List       *mergeNotMatchedActionStates = NIL;
> +   ResultRelInfo *resultRelInfo;
> +   ListCell   *l;
> +   TupleTableSlot  *myslot;
> +
> +   /*
> +    * We are dealing with NOT MATCHED tuple. Since for MERGE, partition
> tree
>
> *the partition tree
>

Fixed.


>
>
> +    * is not expanded for the result relation, we continue to work with
> the
> +    * currently active result relation, which should be of the root of the
> +    * partition tree.
> +    */
> +   resultRelInfo = mtstate->resultRelInfo;
>
> "should be"? "is", I hope?  Given that it's referencing
> mtstate->resultRelInfo which is only set in one place...
>
>
Yeah, fixed.


>
>
> +/* flags for mt_merge_subcommands */
> +#define MERGE_INSERT   0x01
> +#define MERGE_UPDATE   0x02
> +#define MERGE_DELETE   0x04
>
> Hm, it seems a bit weird to define these file-local, given there's
> another file implementing a good chunk of merge.
>

Ok. Moved them execMerge.h, which made sense after I moved the
initialisation related code to execMerge.c



>
> +   /*
> +    * Initialize hufdp. Since the caller is only interested in the failure
> +    * status, initialize with the state that is used to indicate
> successful
> +    * operation.
> +    */
> +   if (hufdp)
> +       hufdp->result = HeapTupleMayBeUpdated;
> +
>
> This signals for me that the addition of the result field to HUFD wasn't
> architecturally the right thing. HUFD is supposed to be supposed to be
> returned by heap_update(), reusing and setting it from other places
> seems like a layering violation to me.
>
>
I am not sure I agree. Sure we can keep adding more parameters to
ExecUpdate/ExecDelete and such routines, but I thought passing back all
information via a single struct makes more sense.


>
>
> diff --git a/src/backend/optimizer/plan/setrefs.c
> b/src/backend/optimizer/plan/setrefs.c
> index 69dd327f0c9..cd540a0df5b 100644
> --- a/src/backend/optimizer/plan/setrefs.c
> +++ b/src/backend/optimizer/plan/setrefs.c
> @@ -851,6 +851,60 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int
> rtoffset)
>                         fix_scan_list(root, splan->exclRelTlist, rtoffset);
>                 }
>
> +               /*
> +                * The MERGE produces the target rows by performing a right
>
> s/the MERGE/the MERGE statement/?
>
>
Fixed.


>
> +                * join between the target relation and the source relation
> +                * (which could be a plain relation or a subquery). The
> INSERT
> +                * and UPDATE actions of the MERGE requires access to the
>
> same.
>
>
Fixed.


>
>
> +opt_and_condition:
>
> Renaming this to be a bit more merge specific seems like a good idea.
>

Renamed to opt_merge_when_and_condition


>
> +
> +   /*
> +    * Add a whole-row-Var entry to support references to "source.*".
> +    */
> +   var = makeWholeRowVar(rt_rte, rt_rtindex, 0, false);
> +   te = makeTargetEntry((Expr *) var, list_length(*mergeSourceTargetList)
> + 1,
> +                        NULL, true);
>
> Why can't this properly dealt with by transformWholeRowRef() etc?
>
>
I just followed ON CONFLICT style. May be there is a better way, but not
clear how.


>
>
> +                   if (selectStmt && (selectStmt->valuesLists == NIL ||
> +                                      selectStmt->sortClause != NIL ||
> +                                      selectStmt->limitOffset != NULL ||
> +                                      selectStmt->limitCount != NULL ||
> +                                      selectStmt->lockingClause != NIL ||
> +                                      selectStmt->withClause != NULL))
> +                       ereport(ERROR,
> +                               (errcode(ERRCODE_SYNTAX_ERROR),
> +                                errmsg("SELECT not allowed in MERGE
> INSERT statement")));
>
> +                   if (selectStmt && list_length(selectStmt->valuesLists)
> > 1)
> +                       ereport(ERROR,
> +                               (errcode(ERRCODE_SYNTAX_ERROR),
> +                                errmsg("Multiple VALUES clauses not
> allowed in MERGE INSERT statement")));
>
> Shouldn't this include an error position?
>
>
I will work on this as a separate follow-up patch. I tried adding location
to MergeAction, but that alone is probably not sufficient. So it was
turning out a slightly bigger patch than I anticipated.


>
> Why is this, and not building a proper executor node for merge that
> knows how to get the tuples, the right approach?  We did a rough
> equivalent for matview updates, and I think it turned out to be a pretty
> bad plan.
>
>
I am still not sure why that would be any better. Can you explain in detail
what exactly you've in mind and how's that significantly better than what
we have today?


>
>
> +   /*
> +    * XXX MERGE is unsupported in various cases
> +    */
> +   if (!(pstate->p_target_relation->rd_rel->relkind == RELKIND_RELATION
> ||
> +         pstate->p_target_relation->rd_rel->relkind ==
> RELKIND_PARTITIONED_TABLE))
> +       ereport(ERROR,
> +               (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> +                errmsg("MERGE is not supported for this relation type")));
>
> Shouldn't this report what relation type the error talking about? It's
> not going to necessarily be obvious to the user.  Also, errposition etc
> would be good.
>
>
Hmm, ok. There is getRelationTypeDescription() but otherwise I am surprised
that I couldn't find a ready way to get string representation of relation
kind. Am I missing something? Of course, we can write for the purpose, but
wanted to ensure we are not duplicating something already available.


>
>
> +                       /*
> +                        * Do basic expression transformation (same as a
> ROW()
> +                        * expr, but allow SetToDefault at top level)
> +                        */
> +                       exprList = transformExpressionList(pstate,
> +                                                          (List *)
> linitial(valuesLists),
> +
> EXPR_KIND_VALUES_SINGLE,
> +                                                          true);
> +
> +                       /* Prepare row for assignment to target table */
> +                       exprList = transformInsertRow(pstate, exprList,
> +                                                     istmt->cols,
> +                                                     icolumns, attrnos,
> +                                                     false);
> +                   }
>
> Can't we handle this with a littlebit less code duplication?
>

Hmm, yeah, that will be good. But given Tom's suggestions on the other
thread, I would like to postpone any refactoring here.


>
>  typedef struct HeapUpdateFailureData
>  {
> +   HTSU_Result result;
>     ItemPointerData ctid;
>     TransactionId xmax;
>     CommandId   cmax;
> +   LockTupleMode   lockmode;
>  } HeapUpdateFailureData;
>
>
> These new fields seem not really relateto HUFD, but rather just fields
> the merge code should maintain?
>

As I said, we can possibly track those separately. But they all arise as a
result of update/delete/lock failure. So I would prefer to keep them along
with other fields such as ctid and xmax/cmax. But if you or others insist,
I can move them and pass in/out of various function calls where we need to
maintain those.

Thanks,
Pavan

-- 
 Pavan Deolasee                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

0001-Take-care-of-Andres-s-comments.patch
Description: Binary data

Re: [HACKERS] MERGE SQL Statement for PG11

Reply via email to