Re: [HACKERS] MERGE SQL Statement for PG11

Peter Geoghegan Mon, 29 Jan 2018 12:42:54 -0800

On Mon, Jan 29, 2018 at 6:11 AM, Simon Riggs <si...@2ndquadrant.com> wrote:
> New patch attached that correctly handles all known concurrency cases,
> with expected test output.


This revision, v13, seems much improved in this area.

> This means MERGE will work just fine for "normal" UPDATEs, but it will
> often fail (deterministically) in concurrent tests with mixed
> insert/deletes or UPDATEs that touch the PK, as requested.

* While looking at how you're handling concurrency/EPQ now, I noticed this code:

> +                       /*
> +                        * Test condition, if any
> +                        *
> +                        * In the absence of a condition we perform the action
> +                        * unconditionally (no need to check separately since
> +                        * ExecQual() will return true if there are no
> +                        * conditions to evaluate).
> +                        */
> +                       if (!ExecQual(action->whenqual, econtext))
> +                       {
> +                           if (BufferIsValid(buffer))
> +                               ReleaseBuffer(buffer);
> +                           continue;
> +                       }

(As well as its interactions with ExecUpdate() + ExecDelete(), which
are significant.)

The way that routines like ExecUpdate() interact with MERGE for WHEN
qual + EPQ handling seems kind of convoluted. I hope for something
cleaner in the next revision.

* This also stood out (not sure if you were referring/alluding to this
in the quoted text):

> +                               /*
> +                                * If EvalPlanQual did not return a tuple, it 
> means we
> +                                * have seen a concurrent delete, or a 
> concurrent update
> +                                * where the row has moved to another 
> partition.
> +                                *
> +                                * UPDATE ignores this case and continues.
> +                                *
> +                                * If MERGE has a WHEN NOT MATCHED clause we 
> know that the
> +                                * user would like to INSERT something in 
> this case, yet
> +                                * we can't see the delete with our snapshot, 
> so take the
> +                                * safe choice and throw an ERROR. If the 
> user didn't care
> +                                * about WHEN NOT MATCHED INSERT then neither 
> do we.
> +                                *
> +                                * XXX We might consider setting matched = 
> false and loop
> +                                * back to lmerge though we'd need to do 
> something like
> +                                * EvalPlanQual, but not quite.
> +                                */
> +                               else if (epqstate->epqresult == 
> EPQ_TUPLE_IS_NULL &&
> +                                        node->mt_merge_subcommands & 
> ACL_INSERT)
> +                               {
> +                                   /*
> +                                    * We need to throw a retryable ERROR 
> because of the
> +                                    * concurrent update which we can't 
> handle.
> +                                    */
> +                                   ereport(ERROR,
> +                                           
> (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
> +                                            errmsg("could not serialize 
> access due to concurrent update")));
> +                               }

I don't think that ERRCODE_T_R_SERIALIZATION_FAILURE is ever okay in
READ COMMITTED mode. Anyway, I wonder why we shouldn't just go ahead
an do the WHEN NOT MATCHED INSERT on the basis of information which is
not visible to our snapshot. We choose to not UPDATE/DELETE on the
basis of information from the future already, within EPQ. My first
feeling is that this is a distinction without a difference, and you
should actually go and INSERT at this point, though I reserve the
right to change my mind about that.

Yeah, EPQ semantics are icky, but would actually inserting here really
be any worse than what we do?

Other things I noticed (not related to concurrency) following a fairly
quick pass:

* Basic tab completion support would be nice.

* The SQL standard explicitly imposes this limitation:

postgres=# explain merge into cities a using cities b on a.city =
b.city when matched and (select 1=1) then update set country =
b.country;
ERROR:  0A000: cannot use subquery in WHEN AND condition
LINE 1: ...sing cities b on a.city = b.city when matched and (select 1=...
                                                             ^
LOCATION:  transformSubLink, parse_expr.c:1865

Why do you, though? Do we really need it? I'm just curious about your
thoughts on it. To be clear, I'm not asserting that you're wrong.

* ISTM that this patch should have inserted/updated/deleted rows, in
roughly the same style as ON CONFLICT's EXPLAIN ANALYZE output. I
mentioned this already, and you seemed unclear on what I meant.
Hopefully my remarks here are clearer.

* Subselect handling is buggy:

postgres=# merge into cities a using cities b on a.city = b.city when
matched and a.city = 'Straffan' then update set country = (select
'Some country');
ERROR:  XX000: unrecognized node type: 114
LOCATION:  ExecInitExprRec, execExpr.c:2114

* Why no CTE support? SQL Server has this.

* The INSERT ... SELECT syntax doesn't work:

postgres=# merge into array_test a using (select '{1,2,3}'::int[] aaa)
b on a.aaa = b.aaa when matched then update set aaa = default when not
matched then insert (aaa) select '{1,2,3}';
ERROR:  42601: syntax error at or near "select"
LINE 1: ... aaa = default when not matched then insert (aaa) select '{1...
                                                             ^
LOCATION:  scanner_yyerror, scan.l:1092

But docs imply otherwise -- "source_query -- A query (SELECT statement
or VALUES statement) that supplies the rows to be merged into the
target_table_name". Either the docs are wrong, or the code is wrong.
Hopefully you can just fix the code.

* Rules are not going to be supported, on the grounds that the
behavior is unclear, which I suppose is fine. But what about
ruleutils.c support?

That seems like entirely another matter to me. What about EXPLAIN,
etc? Deparse support seems to be considered generic infrastructure,
that doesn't necessarily have much to do with the user-visible rules
feature.

* This restriction seems arbitrary and unjustified:

postgres=# merge into testoids a using (select 1 "key", 'foo' "data")
b on a.key = b.key when matched and a.oid = 5 then update set data =
b.data when not matched then insert (key, data) values (1, 'foo');
ERROR:  42P10: system column "oid" reference in WHEN AND condition is invalid
LINE 1: ...'foo' "data") b on a.key = b.key when matched and a.oid = 5 ...
                                                             ^
LOCATION:  scanRTEForColumn, parse_relation.c:738

* Wholerow vars are broken:

postgres=# merge into testoids a using (select 1 "key", 'foo' "data")
b on a.key = b.key when matched then update set data = b.*::text when
not matched then insert (key, data) values (1, 'foo');
ERROR:  XX000: variable not found in subplan target lists
LOCATION:  fix_join_expr_mutator, setrefs.c:2351

That's all I have for now.

-- 
Peter Geoghegan

Re: [HACKERS] MERGE SQL Statement for PG11

Reply via email to