Hi, Feng,
Thank you, that's a great suggestion ! 

I have already implemented FilterPushDown and removed that parameter on 
DorisDynamicTableSource[1], and also updated FLIP.

Regarding the mention of [Doris also aborts transactions], it may not have been 
described accurately. It mainly refers to the automatic expiration of 
long-running transactions in Doris that have not been committed for a prolonged 
period.

As for two-phase commit, when a commit fails, the checkpoint will also fail, 
and the job will be continuously retried.

[1] 
https://github.com/apache/doris-flink-connector/blob/master/flink-doris-connector/src/main/java/org/apache/doris/flink/table/DorisDynamicTableSource.java#L58

Brs
di.wu


> 2024年3月15日 14:53,Feng Jin <jinfeng1...@gmail.com> 写道:
> 
> Hi Di
> 
> Thank you for initiating this FLIP, +1 for this.
> 
> Regarding the option `doris.filter.query` of doris source table
> 
> Can we directly implement the FilterPushDown capability of Flink Source
> like Jdbc Source [1] instead of introducing an option?
> 
> 
> Regarding two-phase commit,
> 
>> At the same time, Doris will also abort transactions that have not been
> committed for a long time
> 
> Can we control the transaction timeout in the connector?
> And control the behavior when timeout occurs, whether to discard by default
> or trigger job failure?
> 
> 
> [1]. https://issues.apache.org/jira/browse/FLINK-16024
> 
> Best,
> Feng
> 
> 
> On Tue, Mar 12, 2024 at 12:12 AM Ferenc Csaky <ferenc.cs...@pm.me.invalid>
> wrote:
> 
>> Hi,
>> 
>> Thanks for driving this, +1 for the FLIP.
>> 
>> Best,
>> Ferenc
>> 
>> 
>> 
>> 
>> On Monday, March 11th, 2024 at 15:17, Ahmed Hamdy <hamdy10...@gmail.com>
>> wrote:
>> 
>>> 
>>> 
>>> Hello,
>>> Thanks for the proposal, +1 for the FLIP.
>>> 
>>> Best Regards
>>> Ahmed Hamdy
>>> 
>>> 
>>> On Mon, 11 Mar 2024 at 15:12, wudi 676366...@qq.com.invalid wrote:
>>> 
>>>> Hi, Leonard
>>>> Thank you for your suggestion.
>>>> I referred to other Connectors[1], modified the naming and types of
>>>> relevant parameters[2], and also updated FLIP.
>>>> 
>>>> [1]
>>>> 
>> https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/connectors/table/overview/
>>>> [1]
>>>> 
>> https://github.com/apache/doris-flink-connector/blob/master/flink-doris-connector/src/main/java/org/apache/doris/flink/table/DorisConfigOptions.java
>>>> 
>>>> Brs,
>>>> di.wu
>>>> 
>>>>> 2024年3月7日 14:33,Leonard Xu xbjt...@gmail.com 写道:
>>>>> 
>>>>> Thanks wudi for the updating, the FLIP generally looks good to me, I
>>>>> only left two minor suggestions:
>>>>> 
>>>>> (1) The suffix `.s` in configoption doris.request.query.timeout.s
>> looks
>>>>> strange to me, could we change all time interval related option
>> value type
>>>>> to Duration ?
>>>>> 
>>>>> (2) Could you check and improve all config options like
>>>>> `doris.exec.mem.limit` to make them to follow flink config option
>> naming
>>>>> and value type?
>>>>> 
>>>>> Best,
>>>>> Leonard
>>>>> 
>>>>>>> 2024年3月6日 06:12,Jing Ge j...@ververica.com.INVALID 写道:
>>>>>>> 
>>>>>>> Hi Di,
>>>>>>> 
>>>>>>> Thanks for your proposal. +1 for the contribution. I'd like to
>> know
>>>>>>> your
>>>>>>> thoughts about the following questions:
>>>>>>> 
>>>>>>> 1. According to your clarification of the exactly-once, thanks
>> for it
>>>>>>> BTW,
>>>>>>> no PreCommitTopology is required. Does it make sense to let
>>>>>>> DorisSink[1]
>>>>>>> implement SupportsCommitter, since the TwoPhaseCommittingSink is
>>>>>>> deprecated[2] before turning the Doris connector into a Flink
>>>>>>> connector?
>>>>>>> 2. OLAP engines are commonly used as the tail/downstream of a
>> data
>>>>>>> pipeline
>>>>>>> to support further e.g. ad-hoc query or cube with feasible
>>>>>>> pre-aggregation.
>>>>>>> Just out of curiosity, would you like to share some real use
>> cases that
>>>>>>> will use OLAP engines as the source of a streaming data
>> pipeline? Or it
>>>>>>> will only be used as the source for the batch?
>>>>>>> 3. The E2E test only covered sink[3], if I am not mistaken.
>> Would you
>>>>>>> like
>>>>>>> to test the source in E2E too?
>>>>>>> 
>>>>>>> [1]
>>>> 
>>>> 
>> https://github.com/apache/doris-flink-connector/blob/43e0e5cf9b832854ea228fb093077872e3a311b6/flink-doris-connector/src/main/java/org/apache/doris/flink/sink/DorisSink.java#L55
>>>> 
>>>>>>> [2]
>>>> 
>>>> 
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-372%3A+Enhance+and+synchronize+Sink+API+to+match+the+Source+API
>>>> 
>>>>>>> [3]
>>>> 
>>>> 
>> https://github.com/apache/doris-flink-connector/blob/43e0e5cf9b832854ea228fb093077872e3a311b6/flink-doris-connector/src/test/java/org/apache/doris/flink/tools/cdc/MySQLDorisE2ECase.java#L96
>>>> 
>>>>>>> Best regards,
>>>>>>> Jing
>>>>>>> 
>>>>>>> On Tue, Mar 5, 2024 at 11:18 AM wudi 676366...@qq.com.invalid
>> wrote:
>>>>>>> 
>>>>>>>> Hi, Jeyhun Karimov.
>>>>>>>> Thanks for your question.
>>>>>>>> 
>>>>>>>> - How to ensure Exactly-Once?
>>>>>>>> 1. When the Checkpoint Barrier arrives, DorisSink will trigger
>> the
>>>>>>>> precommit api of StreamLoad to complete the persistence of
>> data in
>>>>>>>> Doris
>>>>>>>> (the data will not be visible at this time), and will also
>> pass this
>>>>>>>> TxnID
>>>>>>>> to the Committer.
>>>>>>>> 2. When this Checkpoint of the entire Job is completed, the
>> Committer
>>>>>>>> will
>>>>>>>> call the commit api of StreamLoad and commit TxnID to complete
>> the
>>>>>>>> visibility of the transaction.
>>>>>>>> 3. When the task is restarted, the Txn with successful
>> precommit and
>>>>>>>> failed commit will be aborted based on the label-prefix, and
>> Doris'
>>>>>>>> abort
>>>>>>>> API will be called. (At the same time, Doris will also abort
>>>>>>>> transactions
>>>>>>>> that have not been committed for a long time)
>>>>>>>> 
>>>>>>>> ps: At the same time, this part of the content has been
>> updated in
>>>>>>>> FLIP
>>>>>>>> 
>>>>>>>> - Because the default table model in Doris is Duplicate (
>>>>>>>> https://doris.apache.org/docs/data-table/data-model/), which
>> does not
>>>>>>>> have a primary key, batch writing may cause data duplication,
>> but
>>>>>>>> UNIQ The
>>>>>>>> model has a primary key, which ensures the idempotence of
>> writing,
>>>>>>>> thus
>>>>>>>> achieving Exactly-Once
>>>>>>>> 
>>>>>>>> Brs,
>>>>>>>> di.wu
>>>>>>>> 
>>>>>>>>> 2024年3月2日 17:50,Jeyhun Karimov je.kari...@gmail.com 写道:
>>>>>>>>> 
>>>>>>>>> Hi,
>>>>>>>>> 
>>>>>>>>> Thanks for the proposal. +1 for the FLIP.
>>>>>>>>> I have a few questions:
>>>>>>>>> 
>>>>>>>>> - How exactly the two (Stream Load's two-phase commit and
>> Flink's
>>>>>>>>> two-phase
>>>>>>>>> commit) combination will ensure the e2e exactly-once
>> semantics?
>>>>>>>>> 
>>>>>>>>> - The FLIP proposes to combine Doris's batch writing with the
>>>>>>>>> primary key
>>>>>>>>> table to achieve Exactly-Once semantics. Could you elaborate
>> more on
>>>>>>>>> that?
>>>>>>>>> Why it is not the default behavior but a workaround?
>>>>>>>>> 
>>>>>>>>> Regards,
>>>>>>>>> Jeyhun
>>>>>>>>> 
>>>>>>>>> On Sat, Mar 2, 2024 at 10:14 AM Yanquan Lv
>> decq12y...@gmail.com
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> Thanks for driving this.
>>>>>>>>>> The content is very detailed, it is recommended to add a
>> section on
>>>>>>>>>> Test
>>>>>>>>>> Plan for more completeness.
>>>>>>>>>> 
>>>>>>>>>> Di Wu d...@apache.org 于2024年1月25日周四 15:40写道:
>>>>>>>>>> 
>>>>>>>>>>> Hi all,
>>>>>>>>>>> 
>>>>>>>>>>> Previously, we had some discussions about contributing
>> Flink Doris
>>>>>>>>>>> Connector to the Flink community [1]. I want to further
>> promote
>>>>>>>>>>> this
>>>>>>>>>>> work.
>>>>>>>>>>> I hope everyone will help participate in this FLIP
>> discussion and
>>>>>>>>>>> provide
>>>>>>>>>>> more valuable opinions and suggestions.
>>>>>>>>>>> Thanks.
>>>>>>>>>>> 
>>>>>>>>>>> [1]
>>>>>>>>>>> 
>> https://lists.apache.org/thread/lvh8g9o6qj8bt3oh60q81z0o1cv3nn8p
>>>>>>>>>>> 
>>>>>>>>>>> Brs,
>>>>>>>>>>> di.wu
>>>>>>>>>>> 
>>>>>>>>>>> On 2023/12/07 05:02:46 wudi wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> Hi all,
>>>>>>>>>>>> 
>>>>>>>>>>>> As discussed in the previous email [1], about
>> contributing the
>>>>>>>>>>>> Flink
>>>>>>>>>>>> Doris Connector to the Flink community.
>>>>>>>>>>>> 
>>>>>>>>>>>> Apache Doris[2] is a high-performance, real-time
>> analytical
>>>>>>>>>>>> database
>>>>>>>>>>>> based on MPP architecture, for scenarios where Flink
>> is used for
>>>>>>>>>>>> data
>>>>>>>>>>>> analysis, processing, or real-time writing on Doris,
>> Flink Doris
>>>>>>>>>>>> Connector
>>>>>>>>>>>> is an effective tool.
>>>>>>>>>>>> 
>>>>>>>>>>>> At the same time, Contributing Flink Doris Connector
>> to the Flink
>>>>>>>>>>>> community will further expand the Flink Connectors
>> ecosystem.
>>>>>>>>>>>> 
>>>>>>>>>>>> So I would like to start an official discussion
>> FLIP-399: Flink
>>>>>>>>>>>> Connector Doris[3].
>>>>>>>>>>>> 
>>>>>>>>>>>> Looking forward to comments, feedbacks and suggestions
>> from the
>>>>>>>>>>>> community on the proposal.
>>>>>>>>>>>> 
>>>>>>>>>>>> [1]
>>>>>>>>>>>> 
>> https://lists.apache.org/thread/lvh8g9o6qj8bt3oh60q81z0o1cv3nn8p
>>>>>>>>>>>> [2]
>>>> 
>>>> https://doris.apache.org/docs/dev/get-starting/what-is-apache-doris/
>>>> 
>>>>>>>>>>>> [3]
>>>> 
>>>> 
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-399%3A+Flink+Connector+Doris
>>>> 
>>>>>>>>>>>> Brs,
>>>>>>>>>>>> 
>>>>>>>>>>>> di.wu
>> 

Reply via email to