Hi Jing Ge, thanks for your suggestions. 1. Currently, the Flink Doris Connector is compatible with Flink versions 1.15-1.18. SupportsCommitter[1] seems to be introduced in Flink 1.19, and most users may not have upgraded their Flink environments to that version yet. Modifying it now could lead to incompatibilities. I think we can postpone the modification and make it together with other connectors. What do you think?
2. Yes, currently DorisSource only supports batch reading, typically used for data synchronization and ETL. Streaming reading is not supported yet, which requires the capability of Doris Binlog (mentioned in the Doris 2024 RoadMap[2]). Streaming reading can be used to capture incremental events from the database, making it more convenient for users to process real-time data newly added to Doris. 3. E2ECase[3] for DorisSource have been added, and the TestPlan in the FLIP has been modified. [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-372%3A+Enhance+and+synchronize+Sink+API+to+match+the+Source+API [2] https://github.com/apache/doris/issues/30669 [3] https://github.com/apache/doris-flink-connector/blob/master/flink-doris-connector/src/test/java/org/apache/doris/flink/tools/cdc/DorisDorisE2ECase.java Brs, di.wu > 2024年3月6日 06:12,Jing Ge <j...@ververica.com.INVALID> 写道: > > Hi Di, > > Thanks for your proposal. +1 for the contribution. I'd like to know your > thoughts about the following questions: > > 1. According to your clarification of the exactly-once, thanks for it BTW, > no PreCommitTopology is required. Does it make sense to let DorisSink[1] > implement SupportsCommitter, since the TwoPhaseCommittingSink is > deprecated[2] before turning the Doris connector into a Flink connector? > 2. OLAP engines are commonly used as the tail/downstream of a data pipeline > to support further e.g. ad-hoc query or cube with feasible pre-aggregation. > Just out of curiosity, would you like to share some real use cases that > will use OLAP engines as the source of a streaming data pipeline? Or it > will only be used as the source for the batch? > 3. The E2E test only covered sink[3], if I am not mistaken. Would you like > to test the source in E2E too? > > [1] > https://github.com/apache/doris-flink-connector/blob/43e0e5cf9b832854ea228fb093077872e3a311b6/flink-doris-connector/src/main/java/org/apache/doris/flink/sink/DorisSink.java#L55 > [2] > https://cwiki.apache.org/confluence/display/FLINK/FLIP-372%3A+Enhance+and+synchronize+Sink+API+to+match+the+Source+API > [3] > https://github.com/apache/doris-flink-connector/blob/43e0e5cf9b832854ea228fb093077872e3a311b6/flink-doris-connector/src/test/java/org/apache/doris/flink/tools/cdc/MySQLDorisE2ECase.java#L96 > > Best regards, > Jing > > On Tue, Mar 5, 2024 at 11:18 AM wudi <676366...@qq.com.invalid> wrote: > >> Hi, Jeyhun Karimov. >> Thanks for your question. >> >> - How to ensure Exactly-Once? >> 1. When the Checkpoint Barrier arrives, DorisSink will trigger the >> precommit api of StreamLoad to complete the persistence of data in Doris >> (the data will not be visible at this time), and will also pass this TxnID >> to the Committer. >> 2. When this Checkpoint of the entire Job is completed, the Committer will >> call the commit api of StreamLoad and commit TxnID to complete the >> visibility of the transaction. >> 3. When the task is restarted, the Txn with successful precommit and >> failed commit will be aborted based on the label-prefix, and Doris' abort >> API will be called. (At the same time, Doris will also abort transactions >> that have not been committed for a long time) >> >> ps: At the same time, this part of the content has been updated in FLIP >> >> - Because the default table model in Doris is Duplicate ( >> https://doris.apache.org/docs/data-table/data-model/), which does not >> have a primary key, batch writing may cause data duplication, but UNIQ The >> model has a primary key, which ensures the idempotence of writing, thus >> achieving Exactly-Once >> >> Brs, >> di.wu >> >> >>> 2024年3月2日 17:50,Jeyhun Karimov <je.kari...@gmail.com> 写道: >>> >>> Hi, >>> >>> Thanks for the proposal. +1 for the FLIP. >>> I have a few questions: >>> >>> - How exactly the two (Stream Load's two-phase commit and Flink's >> two-phase >>> commit) combination will ensure the e2e exactly-once semantics? >>> >>> - The FLIP proposes to combine Doris's batch writing with the primary key >>> table to achieve Exactly-Once semantics. Could you elaborate more on >> that? >>> Why it is not the default behavior but a workaround? >>> >>> Regards, >>> Jeyhun >>> >>> On Sat, Mar 2, 2024 at 10:14 AM Yanquan Lv <decq12y...@gmail.com> wrote: >>> >>>> Thanks for driving this. >>>> The content is very detailed, it is recommended to add a section on Test >>>> Plan for more completeness. >>>> >>>> Di Wu <d...@apache.org> 于2024年1月25日周四 15:40写道: >>>> >>>>> Hi all, >>>>> >>>>> Previously, we had some discussions about contributing Flink Doris >>>>> Connector to the Flink community [1]. I want to further promote this >>>> work. >>>>> I hope everyone will help participate in this FLIP discussion and >> provide >>>>> more valuable opinions and suggestions. >>>>> Thanks. >>>>> >>>>> [1] https://lists.apache.org/thread/lvh8g9o6qj8bt3oh60q81z0o1cv3nn8p >>>>> >>>>> Brs, >>>>> di.wu >>>>> >>>>> >>>>> >>>>> On 2023/12/07 05:02:46 wudi wrote: >>>>>> >>>>>> Hi all, >>>>>> >>>>>> As discussed in the previous email [1], about contributing the Flink >>>>> Doris Connector to the Flink community. >>>>>> >>>>>> >>>>>> Apache Doris[2] is a high-performance, real-time analytical database >>>>> based on MPP architecture, for scenarios where Flink is used for data >>>>> analysis, processing, or real-time writing on Doris, Flink Doris >>>> Connector >>>>> is an effective tool. >>>>>> >>>>>> At the same time, Contributing Flink Doris Connector to the Flink >>>>> community will further expand the Flink Connectors ecosystem. >>>>>> >>>>>> So I would like to start an official discussion FLIP-399: Flink >>>>> Connector Doris[3]. >>>>>> >>>>>> Looking forward to comments, feedbacks and suggestions from the >>>>> community on the proposal. >>>>>> >>>>>> [1] https://lists.apache.org/thread/lvh8g9o6qj8bt3oh60q81z0o1cv3nn8p >>>>>> [2] >>>> https://doris.apache.org/docs/dev/get-starting/what-is-apache-doris/ >>>>>> [3] >>>>> >>>> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-399%3A+Flink+Connector+Doris >>>>>> >>>>>> >>>>>> Brs, >>>>>> >>>>>> di.wu >>>>>> >>>>> >>>> >> >>