Hi, On 2022-01-28 17:27:37 -0300, Alvaro Herrera wrote: > MERGE, v10. I am much more comfortable with this version; I have > removed a bunch of temporary hacks and cleaned up the interactions with > table AM and executor, which is something that had been bothering me for > a while. The complete set of changes can be seen in github, > https://github.com/alvherre/postgres/commits/merge-15
Any chance you could split this into something more reviewable? The overall diff stats are: 102 files changed, 8589 insertions(+), 234 deletions(-) thats pretty hard to really review. Incremental commits don't realy help with that. > I am not aware of anything of significance in terms of remaining work > for this project. One thing from skimming: There's not enough documentation about the approach imo - it's a complicated enough feature to deserve more than the one paragraph in src/backend/executor/README. I'm still [1],[2] uncomfortable with execMerge.c kinda-but-not-really being an executor node. > The one thing I'm a bit bothered about is the fact that we expose a lot of > executor functions previously static. I am now wondering if it would be > better to move the MERGE executor support functions into nodeModifyTable.c, > which I think would mean we would not have to expose those function > prototypes. I'm worried about the complexity of nodeModifyTuple.c as is. Moving more code in there does not seem like the right call to me - we should do the opposite if anything. A few inline comments below. No real review, just stuff noticed while skimming to see where this is at. Greetings, Andres [1] https://www.postgresql.org/message-id/20180405200003.gar3j26gsk32g...@alap3.anarazel.de [2] https://www.postgresql.org/message-id/20180403021800.b5nsgiclzanob...@alap3.anarazel.de > diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c > index 1a9c1ac290..280ac40e63 100644 > --- a/src/backend/commands/trigger.c > +++ b/src/backend/commands/trigger.c This stuff seems like one candidate for splitting out. > + /* > + * We maintain separate transition tables for UPDATE/INSERT/DELETE since > + * MERGE can run all three actions in a single statement. Note that > UPDATE > + * needs both old and new transition tables whereas INSERT needs only > new, > + * and DELETE needs only old. > + */ > + > + /* "old" transition table for UPDATE, if any */ > + Tuplestorestate *old_upd_tuplestore; > + /* "new" transition table for UPDATE, if any */ > + Tuplestorestate *new_upd_tuplestore; > + /* "old" transition table for DELETE, if any */ > + Tuplestorestate *old_del_tuplestore; > + /* "new" transition table INSERT, if any */ > + Tuplestorestate *new_ins_tuplestore; > + > TupleTableSlot *storeslot; /* for converting to tuplestore's > format */ > }; Do we need to do something slightly clever about the memory limits for the tuplestore? Each of them having a separate work_mem makes nodeModifyTable.c one memory hungry node (the worst of all maybe?). > + > +/* > + * Perform MERGE. > + */ > +TupleTableSlot * > +ExecMerge(ModifyTableState *mtstate, ResultRelInfo *resultRelInfo, > + EState *estate, TupleTableSlot *slot) > +{ > + ExprContext *econtext = mtstate->ps.ps_ExprContext; > + ItemPointer tupleid; > + ItemPointerData tuple_ctid; > + bool matched = false; > + Datum datum; > + bool isNull; > + > + /* > + * Reset per-tuple memory context to free any expression evaluation > + * storage allocated in the previous cycle. > + */ > + ResetExprContext(econtext); Hasn't this already happened in ExecModifyTable()? > + /* > + * We run a JOIN between the target relation and the source relation to > + * find a set of candidate source rows that have a matching row in the > + * target table and a set of candidate source rows that do not have a > + * matching row in the target table. If the join returns a tuple with > the > + * target relation's row-ID set, that implies that the join found a > + * matching row for the given source tuple. This case triggers the WHEN > + * MATCHED clause of the MERGE. Whereas a NULL in the target relation's > + * row-ID column indicates a NOT MATCHED case. > + */ > + datum = ExecGetJunkAttribute(slot, > + > resultRelInfo->ri_RowIdAttNo, > + &isNull); Could this be put somewhere common with the equivalent call in ExecModifyTable()? > + * A concurrent update can: > + * > + * 1. modify the target tuple so that it no longer satisfies the > + * additional quals attached to the current WHEN MATCHED action > + * > + * In this case, we are still dealing with a WHEN MATCHED case. > + * In this case, we recheck the list of WHEN MATCHED actions from > + * the start and choose the first one that satisfies the new target > + * tuple. Duplicated "in this case" > + case TM_Invisible: > + > + /* > + * This state should never be reached > since the underlying > + * JOIN runs with a MVCC snapshot and > EvalPlanQual runs > + * with a dirty snapshot. So such a row > should have never > + * been returned for MERGE. > + */ > + elog(ERROR, "unexpected invisible > tuple"); > + break; Seems like it was half duplicated at some point? > + case TM_Updated: > + case TM_Deleted: > + > + /* > + * The target tuple was concurrently > updated/deleted by > + * some other transaction. > + * > + * If the current tuple is the last > tuple in the update > + * chain, then we know that the tuple > was concurrently > + * deleted. Just return and let the > caller try NOT MATCHED > + * actions. Is it really correct to behave that way when using repeatable read / serializable? > + /* > + * Project the tuple. In case of a partitioned > table, the > + * projection was already built to use the > root's descriptor, > + * so we don't need to map the tuple here. > + */ > + actionInfo.actionState = action; > + insert_slot = ExecProject(action->mas_proj); > + > + (void) ExecInsert(mtstate, rootRelInfo, > + insert_slot, > slot, > + estate, > &actionInfo, > + > mtstate->canSetTag); > + InstrCountFiltered1(&mtstate->ps, 1); What happens if somebody concurrently inserts a conflicting row? > --- a/src/include/executor/instrument.h > +++ b/src/include/executor/instrument.h > @@ -82,8 +82,11 @@ typedef struct Instrumentation > double ntuples; /* total tuples produced */ > double ntuples2; /* secondary node-specific > tuple counter */ > double nloops; /* # of run cycles for this > node */ > - double nfiltered1; /* # of tuples removed by > scanqual or joinqual */ > - double nfiltered2; /* # of tuples removed by > "other" quals */ > + double nfiltered1; /* # tuples removed by scanqual > or joinqual OR > + * # tuples > inserted by MERGE */ > + double nfiltered2; /* # tuples removed by "other" > quals OR # > + * tuples > updated by MERGE */ > + double nfiltered3; /* # tuples deleted by MERGE */ It was a bad idea to have nfiltered1/2. Arriving at nfiltered3, it seems we really should rename them...