Re: [DISCUSS]Support the merge statement in FlinkSQL

Martijn Visser Thu, 10 Feb 2022 00:09:30 -0800

Hi zoucao,

I see that this message was posted twice, so I choose to only reply to the
latest one (this one). Thanks for bringing this up for discussion.


I agree that support for a merge statement would be a welcome addition to
Flink SQL for those that are using it for bounded jobs. How would you see
merge work for streaming data?

I do think that in order for Flink to properly support this, we should
leverage Calcite for this. If there's no proper/full support for merge in
Calcite, I don't think we should add this ourselves. I think the time
investment and increase in technical debt doesn't outweigh the benefits
that this would bring to Flink. If it's really that important, I think it's
better to make that time investment at Calcite's implementation before
bringing this to Flink.

Best regards,

Martijn Visser
https://twitter.com/MartijnVisser82


On Wed, 9 Feb 2022 at 08:40, zhou chao <zhouchao...@hotmail.com> wrote:

> Hi, devs!
> Jingfeng and I would like to start a discussion about the MERGE statement,
> and the discussion consists of two parts. In the first part, we want to
> explore and collect the cases and motivations of the MERGE statement users.
> In the second part, we want to find out the possibility for Flink SQL to
> support the merge statement.
>
> Before driving the first topic, we want to introduce the definition and
> benefits of the merge statement. The MERGE statement in SQL is a very
> popular clause and it can handle inserts, updates, and deletes all in a
> single transaction without having to write separate logic for each of
> these.
> For each insert, update, or delete statement, we can specify conditions
> separately. Now, many Engine/DBs have supported this feature, for example,
> SQL Server[1], Spark[2], Hive[3],  pgSQL[4].
>
> Our use case:
> Order analysis & processing is one the most important scenario, but
> sometimes updated orders have a long time span compared with the last one
> with the same primary key, in the meanwhile, the states for this key have
> expired, such that the wrong Agg result will be achieved. In this
> situation, we use the merge statement in a batch job to correct the
> results, and now spark + iceberg is chosen in our internal. In the future,
> we want to unify the batch & streaming by using FlinkSQL in our internal,
> it would be better if Flink could support the merge statement. If you have
> other use cases and opinions, plz show us here.
>
> Now, calcite does not have good support for the merge statement, and there
> exists a Jira CALCITE-4338[5] to track. Could we support the merge
> statement relying on the limited support from calcite-1.26.0? I wrote a
> simple doc[6] to drive this, just want to find out the possibility for
> Flink SQL to support the merge statement.
>
> Looking forward to your feedback, thanks.
>
> best,
> zoucao
>
>
> [1]
> https://docs.microsoft.com/en-us/sql/t-sql/statements/merge-transact-sql?redirectedfrom=MSDN&view=sql-server-ver15
> [2]https://issues.apache.org/jira/browse/SPARK-28893
> [3]
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Merge
> [4]https://www.postgresql.org/message-id/attachment/23520/sql-merge.html
> [5]https://issues.apache.org/jira/browse/CALCITE-4338
> [6]
> https://docs.google.com/document/d/12wwCqK6zfWGs84ijFZmGPJqCYfYHUPmfx5CvzUkVrw4/edit?usp=sharing

Re: [DISCUSS]Support the merge statement in FlinkSQL

Reply via email to