Re: support for MERGE

Andres Freund Fri, 28 Jan 2022 14:19:56 -0800

Hi,

On 2022-01-28 17:27:37 -0300, Alvaro Herrera wrote:
> MERGE, v10.  I am much more comfortable with this version; I have
> removed a bunch of temporary hacks and cleaned up the interactions with
> table AM and executor, which is something that had been bothering me for
> a while.  The complete set of changes can be seen in github,
> https://github.com/alvherre/postgres/commits/merge-15


Any chance you could split this into something more reviewable? The overall
diff stats are: 102 files changed, 8589 insertions(+), 234 deletions(-) thats
pretty hard to really review. Incremental commits don't realy help with that.


> I am not aware of anything of significance in terms of remaining work
> for this project.

One thing from skimming: There's not enough documentation about the approach
imo - it's a complicated enough feature to deserve more than the one paragraph
in src/backend/executor/README.


I'm still [1],[2] uncomfortable with execMerge.c kinda-but-not-really being an
executor node.


> The one thing I'm a bit bothered about is the fact that we expose a lot of
> executor functions previously static.  I am now wondering if it would be
> better to move the MERGE executor support functions into nodeModifyTable.c,
> which I think would mean we would not have to expose those function
> prototypes.

I'm worried about the complexity of nodeModifyTuple.c as is. Moving more code
in there does not seem like the right call to me - we should do the opposite
if anything.


A few inline comments below. No real review, just stuff noticed while skimming
to see where this is at.


Greetings,

Andres


[1] 
https://www.postgresql.org/message-id/20180405200003.gar3j26gsk32g...@alap3.anarazel.de
[2] 
https://www.postgresql.org/message-id/20180403021800.b5nsgiclzanob...@alap3.anarazel.de


> diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
> index 1a9c1ac290..280ac40e63 100644
> --- a/src/backend/commands/trigger.c
> +++ b/src/backend/commands/trigger.c

This stuff seems like one candidate for splitting out.


> +     /*
> +      * We maintain separate transition tables for UPDATE/INSERT/DELETE since
> +      * MERGE can run all three actions in a single statement. Note that 
> UPDATE
> +      * needs both old and new transition tables whereas INSERT needs only 
> new,
> +      * and DELETE needs only old.
> +      */
> +
> +     /* "old" transition table for UPDATE, if any */
> +     Tuplestorestate *old_upd_tuplestore;
> +     /* "new" transition table for UPDATE, if any */
> +     Tuplestorestate *new_upd_tuplestore;
> +     /* "old" transition table for DELETE, if any */
> +     Tuplestorestate *old_del_tuplestore;
> +     /* "new" transition table INSERT, if any */
> +     Tuplestorestate *new_ins_tuplestore;
> +
>       TupleTableSlot *storeslot;      /* for converting to tuplestore's 
> format */
>  };

Do we need to do something slightly clever about the memory limits for the
tuplestore? Each of them having a separate work_mem makes nodeModifyTable.c
one memory hungry node (the worst of all maybe?).




> +
> +/*
> + * Perform MERGE.
> + */
> +TupleTableSlot *
> +ExecMerge(ModifyTableState *mtstate, ResultRelInfo *resultRelInfo,
> +               EState *estate, TupleTableSlot *slot)
> +{
> +     ExprContext *econtext = mtstate->ps.ps_ExprContext;
> +     ItemPointer tupleid;
> +     ItemPointerData tuple_ctid;
> +     bool            matched = false;
> +     Datum           datum;
> +     bool            isNull;
> +
> +     /*
> +      * Reset per-tuple memory context to free any expression evaluation
> +      * storage allocated in the previous cycle.
> +      */
> +     ResetExprContext(econtext);

Hasn't this already happened in ExecModifyTable()?


> +     /*
> +      * We run a JOIN between the target relation and the source relation to
> +      * find a set of candidate source rows that have a matching row in the
> +      * target table and a set of candidate source rows that do not have a
> +      * matching row in the target table. If the join returns a tuple with 
> the
> +      * target relation's row-ID set, that implies that the join found a
> +      * matching row for the given source tuple. This case triggers the WHEN
> +      * MATCHED clause of the MERGE. Whereas a NULL in the target relation's
> +      * row-ID column indicates a NOT MATCHED case.
> +      */
> +     datum = ExecGetJunkAttribute(slot,
> +                                                              
> resultRelInfo->ri_RowIdAttNo,
> +                                                              &isNull);

Could this be put somewhere common with the equivalent call in
ExecModifyTable()?


> +      * A concurrent update can:
> +      *
> +      * 1. modify the target tuple so that it no longer satisfies the
> +      *    additional quals attached to the current WHEN MATCHED action
> +      *
> +      *    In this case, we are still dealing with a WHEN MATCHED case.
> +      *    In this case, we recheck the list of WHEN MATCHED actions from
> +      *    the start and choose the first one that satisfies the new target
> +      *    tuple.

Duplicated "in this case"


> +                             case TM_Invisible:
> +
> +                                     /*
> +                                      * This state should never be reached 
> since the underlying
> +                                      * JOIN runs with a MVCC snapshot and 
> EvalPlanQual runs
> +                                      * with a dirty snapshot. So such a row 
> should have never
> +                                      * been returned for MERGE.
> +                                      */
> +                                     elog(ERROR, "unexpected invisible 
> tuple");
> +                                     break;

Seems like it was half duplicated at some point?


> +                             case TM_Updated:
> +                             case TM_Deleted:
> +
> +                                     /*
> +                                      * The target tuple was concurrently 
> updated/deleted by
> +                                      * some other transaction.
> +                                      *
> +                                      * If the current tuple is the last 
> tuple in the update
> +                                      * chain, then we know that the tuple 
> was concurrently
> +                                      * deleted. Just return and let the 
> caller try NOT MATCHED
> +                                      * actions.

Is it really correct to behave that way when using repeatable read /
serializable?


> +                             /*
> +                              * Project the tuple.  In case of a partitioned 
> table, the
> +                              * projection was already built to use the 
> root's descriptor,
> +                              * so we don't need to map the tuple here.
> +                              */
> +                             actionInfo.actionState = action;
> +                             insert_slot = ExecProject(action->mas_proj);
> +
> +                             (void) ExecInsert(mtstate, rootRelInfo,
> +                                                               insert_slot, 
> slot,
> +                                                               estate, 
> &actionInfo,
> +                                                               
> mtstate->canSetTag);
> +                             InstrCountFiltered1(&mtstate->ps, 1);

What happens if somebody concurrently inserts a conflicting row?

> --- a/src/include/executor/instrument.h
> +++ b/src/include/executor/instrument.h
> @@ -82,8 +82,11 @@ typedef struct Instrumentation
>       double          ntuples;                /* total tuples produced */
>       double          ntuples2;               /* secondary node-specific 
> tuple counter */
>       double          nloops;                 /* # of run cycles for this 
> node */
> -     double          nfiltered1;             /* # of tuples removed by 
> scanqual or joinqual */
> -     double          nfiltered2;             /* # of tuples removed by 
> "other" quals */
> +     double          nfiltered1;             /* # tuples removed by scanqual 
> or joinqual OR
> +                                                              * # tuples 
> inserted by MERGE */
> +     double          nfiltered2;             /* # tuples removed by "other" 
> quals OR #
> +                                                              * tuples 
> updated by MERGE */
> +     double          nfiltered3;             /* # tuples deleted by MERGE */

It was a bad idea to have nfiltered1/2. Arriving at nfiltered3, it seems we
really should rename them...

Re: support for MERGE

Reply via email to