Hi, Jeyhun Karimov. Thanks for your question. - How to ensure Exactly-Once? 1. When the Checkpoint Barrier arrives, DorisSink will trigger the precommit api of StreamLoad to complete the persistence of data in Doris (the data will not be visible at this time), and will also pass this TxnID to the Committer. 2. When this Checkpoint of the entire Job is completed, the Committer will call the commit api of StreamLoad and commit TxnID to complete the visibility of the transaction. 3. When the task is restarted, the Txn with successful precommit and failed commit will be aborted based on the label-prefix, and Doris' abort API will be called. (At the same time, Doris will also abort transactions that have not been committed for a long time)
ps: At the same time, this part of the content has been updated in FLIP - Because the default table model in Doris is Duplicate (https://doris.apache.org/docs/data-table/data-model/), which does not have a primary key, batch writing may cause data duplication, but UNIQ The model has a primary key, which ensures the idempotence of writing, thus achieving Exactly-Once Brs, di.wu > 2024年3月2日 17:50,Jeyhun Karimov <je.kari...@gmail.com> 写道: > > Hi, > > Thanks for the proposal. +1 for the FLIP. > I have a few questions: > > - How exactly the two (Stream Load's two-phase commit and Flink's two-phase > commit) combination will ensure the e2e exactly-once semantics? > > - The FLIP proposes to combine Doris's batch writing with the primary key > table to achieve Exactly-Once semantics. Could you elaborate more on that? > Why it is not the default behavior but a workaround? > > Regards, > Jeyhun > > On Sat, Mar 2, 2024 at 10:14 AM Yanquan Lv <decq12y...@gmail.com> wrote: > >> Thanks for driving this. >> The content is very detailed, it is recommended to add a section on Test >> Plan for more completeness. >> >> Di Wu <d...@apache.org> 于2024年1月25日周四 15:40写道: >> >>> Hi all, >>> >>> Previously, we had some discussions about contributing Flink Doris >>> Connector to the Flink community [1]. I want to further promote this >> work. >>> I hope everyone will help participate in this FLIP discussion and provide >>> more valuable opinions and suggestions. >>> Thanks. >>> >>> [1] https://lists.apache.org/thread/lvh8g9o6qj8bt3oh60q81z0o1cv3nn8p >>> >>> Brs, >>> di.wu >>> >>> >>> >>> On 2023/12/07 05:02:46 wudi wrote: >>>> >>>> Hi all, >>>> >>>> As discussed in the previous email [1], about contributing the Flink >>> Doris Connector to the Flink community. >>>> >>>> >>>> Apache Doris[2] is a high-performance, real-time analytical database >>> based on MPP architecture, for scenarios where Flink is used for data >>> analysis, processing, or real-time writing on Doris, Flink Doris >> Connector >>> is an effective tool. >>>> >>>> At the same time, Contributing Flink Doris Connector to the Flink >>> community will further expand the Flink Connectors ecosystem. >>>> >>>> So I would like to start an official discussion FLIP-399: Flink >>> Connector Doris[3]. >>>> >>>> Looking forward to comments, feedbacks and suggestions from the >>> community on the proposal. >>>> >>>> [1] https://lists.apache.org/thread/lvh8g9o6qj8bt3oh60q81z0o1cv3nn8p >>>> [2] >> https://doris.apache.org/docs/dev/get-starting/what-is-apache-doris/ >>>> [3] >>> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-399%3A+Flink+Connector+Doris >>>> >>>> >>>> Brs, >>>> >>>> di.wu >>>> >>> >>