MERGE looks like it may need some new infrastructure to support it, depending upon the implementation route. Guidance and discussion requested from main hackers, if possible.
This is a separate post because there's no further discussion here on the external behaviour of MERGE, or its concurrency/locking. Assumptions ----------- * MERGE must be a parameterisable, preparable command. The parameters might be anywhere in the statement i.e. in the USING clause, or in the action statements, or both. * MERGE must work like any other statement, e.g. multi-statement requests must work also e.g. UPDATE...; MERGE ... ;INSERT ... * MERGE works on tables only, not views. PostgreSQL doesn't support updateable views, so this isn't a problem to solve as part of this implementation. * MERGE contains possibly multiple INSERTs, UPDATEs and DELETEs, each of which can invoke a rule to generate even more commands. * We would not add MERGE to the list of possible events that invoke a Rule. (The data changing events would remain as INSERT, UPDATE, DELETE). However, a MERGE statement might be used *within* a Rule, giving a tree of plans rather than a single plan or a list of plans. (I can't see any way or any reason to say "no allowed" to that...) Implications ------------ To fully support prepared queries, MERGE needs to be a "query" not a utility command as I had originally envisaged. MERGE internally consists of one driving query that is always executed in full to retrieve all tuples generated, plus a number of other queries that may or may not get executed according to the evaluation of the WHEN clauses. Rule re-writing happens at the plan level. That means we need to recognise MERGE as being a new form of query: a query that has multiple other optional queries buried within it, each of which can be independently re-written. Only by doing that can we apply the rules correctly to the various parts of MERGE. That means we probably need to introduce new infrastructure in the tcop or executor modules to handle queries-within-queries. This isn't special-casing MERGE so much as introducing infrastructure for a new class of query, such as MERGE, REPLACE, INSERT ELSE UPDATE. (Merge itself does cover almost all cases of this type of query, but we'd be able to fairly easily support all of the different syntax). MERGE would then be represented by a query that has many "side queries" (so called so we don't confused calling them sub-queries). Each side query can be rewritten into a list of queries. Any of those queries could be a MERGE itself, so we might make each Query a tree of Queries. The main query and the side queries are related, in that the values used for the side queries come from the main query. The side queries' external references will be replaced by parameters, in addition to other parameters already supplied. So the side queries are parameterised, even if the MERGE statement wasn't parameterised. The main driving query will be executed once, the side queries once per row (if at all). Let's look back at the infrastructure aspects. My options so far for implementation are 0. Write MERGE in PL/pgSQL. Err, no. 1. We treat the MERGE similar to an Update-with-conditional-effects and have the MERGE's main plan deliver tuples that we then act on during ExecutePlan() using a new ExecMerge() function. We would preserve the text of the side queries and execute them using SPI from within ExecMerge(). That takes care of the rule rewriting, since we effectively don't know what's happening inside SPI. Feels ugly, is all I can say, though actually very similar to the way RI works. 2. Similar to (1), but instead of using SPI we start another portal from within ExecMerge() function. Portals within portals. Seems very likely for something to break in an unexpected way. 3. We treat a MERGE as a new kind of portal query. Any query containing a MERGE will be forced to be a PORTAL_MULTI_QUERY. We then introduce a new query processing routine for MERGE, ProcessQueryWithActions() rather than running ProcessQuery() during PortalRunMulti(). This can then execute the merge-query similarly to a cursor, retrieving tuples that are then fed to the side queries that then run in separate portals. Doing it this way will mean that we don't need to run the executor recursively, we just switch from main query to retrieve next row and then execute the appropriate side query. The executor level calls will look almost exactly as it would if we wrote MERGE in PL/pgSQL - fetching from a main cursor and then executing side queries as appropriate. The hacking will have similar-ish effects to that induced by the changes for RETURNING. Something like ExecutorStart(mainQueryDesc, 0); for (;;) { /* FETCH 1 */ ExecutorRun(mainQueryDesc, ForwardScanDirection, 1L); if (mainQueryDesc->estate->es_processed == 0) break; //identify side query //internal equivalent of exec_bind_message //internal equivalent of exec_execute_message } (3) sounds best to me, so far, though feel free to explain otherwise if you see obvious holes. For (2) or (3) we will need to alter the ReWriter and Planner to recurse through the tree to rewrite and plan the queries. It's not going to effect the main act of planning or rewriting, just how we initiate it. I'll implement a very crude prototype to flush out problems in this area. The rest of this post is some general thoughts on the rest of the implementation. I don't foresee any issues there, just lots of hacking. Examples -------- Some examples of how MERGE might be rewritten with rules are MERGE WHEN MATCHED UPDATE WHEN NOT MATCHED INSERT might be re-written as MERGE WHEN MATCHED UPDATE; INSERT WHEN NOT MATCHED INSERT; INSERT or even MERGE WHEN MATCHED UPDATE; (MERGE WHEN MATCHED UPDATE WHEN NOT MATCHED INSERT) WHEN NOT MATCHED INSERT; INSERT Normally, an UPDATE that triggers another UPDATE on the same table will create a rule recursion error. If we have an UPDATE that calls a MERGE that generates an UPDATE on the target table, then we would like to be able to throw a similar recursion error. To do that we would need to expand all of the sub-actions of the MERGE during a single query rewrite. If we defer the rewriting of the side queries until execution then we will get an infinite recursion error, just slightly later. Also, I can't see any sane set of rules that would generate a set of statements that will violate the required implementation of MERGE. You can always do something stupid like an ON UPDATE DO ALSO DELETE (on same row), however. That would technically be a violation of MERGE, but would have the same effect as ON UPDATE DO INSTEAD DELETE, which would be OK. MERGE side queries ------------------ The main driving query will return the ctid plus all other required columns. The side queries will be constructed by adding in the result relation and a tidscan clause to each query, like this UPDATE SET (set-clause) becomes UPDATE result-relation SET (set-clause) WHERE ctid = $X; We wouldn't use WHERE CURRENT OF because the main query doesn't qualify as an updateable cursor, so that formulation shouldn't work. (Though maybe we might choose to bend the rules a little). In addition, I would translate the MERGE's WHEN clauses into a CASE WHEN expression that can be run as part of the main driving query. This expression will then tell us the subscript of the side query list that will need to be executed for each row returned. case when <when-not-matched-condition-0> then 0 when <when-not-matched-condition-1> then 1 ... when <when-not-matched-condition-N> then N else -1 end ,case when <when-matched-condition-0> then 0 when <when-matched-condition-1> then 1 ... when <when-matched-condition-N> then N else -1 end The main query will then look like this select target.ctid ,case when-not-matched (as above) ,case when-matched (as above) ,(all other columns required for side queries) from <source-query> left outer join <target> on <join-condition> where (<when-matched-condition-0> or <when-matched-condition-1> ... or <when-matched-condition-N>) or (<when-not-matched-condition-0> or <when-not-matched-condition-1> ... or <when-not-matched-condition-N>) The WHERE clause is likely required in case we get queries like this MERGE target t USING (select * from source) s ON (s.pkey = t.pkey) WHEN MATCHED AND s.pkey = $1 UPDATE SET col = $2; which would be perfectly valid, even if we might hope that they had coded like this MERGE target USING (select * from source WHERE index-column = $1) ON (join-condition) WHEN MATCHED UPDATE SET col = $2; Other notes -------------------------- * No, MERGE won't work on inherited table hierarchies. At least to begin with. * MERGE doesn't support RETURNING, if we did we'd have to enforce a RETURNING clause to be the same target list on all sub-plans of the query tree, which is unlikely to be easy. We could enforce a single RETURNING clause on the whole MERGE statement, but that's another thought entirely. We have no precedent from either the SQL Standard or other implementations to do this -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers