Hi zoucao, I see that this message was posted twice, so I choose to only reply to the latest one (this one). Thanks for bringing this up for discussion.
I agree that support for a merge statement would be a welcome addition to Flink SQL for those that are using it for bounded jobs. How would you see merge work for streaming data? I do think that in order for Flink to properly support this, we should leverage Calcite for this. If there's no proper/full support for merge in Calcite, I don't think we should add this ourselves. I think the time investment and increase in technical debt doesn't outweigh the benefits that this would bring to Flink. If it's really that important, I think it's better to make that time investment at Calcite's implementation before bringing this to Flink. Best regards, Martijn Visser https://twitter.com/MartijnVisser82 On Wed, 9 Feb 2022 at 08:40, zhou chao <zhouchao...@hotmail.com> wrote: > Hi, devs! > Jingfeng and I would like to start a discussion about the MERGE statement, > and the discussion consists of two parts. In the first part, we want to > explore and collect the cases and motivations of the MERGE statement users. > In the second part, we want to find out the possibility for Flink SQL to > support the merge statement. > > Before driving the first topic, we want to introduce the definition and > benefits of the merge statement. The MERGE statement in SQL is a very > popular clause and it can handle inserts, updates, and deletes all in a > single transaction without having to write separate logic for each of > these. > For each insert, update, or delete statement, we can specify conditions > separately. Now, many Engine/DBs have supported this feature, for example, > SQL Server[1], Spark[2], Hive[3], pgSQL[4]. > > Our use case: > Order analysis & processing is one the most important scenario, but > sometimes updated orders have a long time span compared with the last one > with the same primary key, in the meanwhile, the states for this key have > expired, such that the wrong Agg result will be achieved. In this > situation, we use the merge statement in a batch job to correct the > results, and now spark + iceberg is chosen in our internal. In the future, > we want to unify the batch & streaming by using FlinkSQL in our internal, > it would be better if Flink could support the merge statement. If you have > other use cases and opinions, plz show us here. > > Now, calcite does not have good support for the merge statement, and there > exists a Jira CALCITE-4338[5] to track. Could we support the merge > statement relying on the limited support from calcite-1.26.0? I wrote a > simple doc[6] to drive this, just want to find out the possibility for > Flink SQL to support the merge statement. > > Looking forward to your feedback, thanks. > > best, > zoucao > > > [1] > https://docs.microsoft.com/en-us/sql/t-sql/statements/merge-transact-sql?redirectedfrom=MSDN&view=sql-server-ver15 > [2]https://issues.apache.org/jira/browse/SPARK-28893 > [3] > https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Merge > [4]https://www.postgresql.org/message-id/attachment/23520/sql-merge.html > [5]https://issues.apache.org/jira/browse/CALCITE-4338 > [6] > https://docs.google.com/document/d/12wwCqK6zfWGs84ijFZmGPJqCYfYHUPmfx5CvzUkVrw4/edit?usp=sharing