Re: [PR] [Fix](write) Make stream load retry exactly-once by reusing the label [doris-spark-connector]

via GitHub Tue, 23 Jun 2026 21:41:33 -0700


JNSimba commented on PR #364:
URL: 
https://github.com/apache/doris-spark-connector/pull/364#issuecomment-4785926291


   Thanks for your contribution. I have a few questions:
   
   1. If the Spark Job does not enable 2PC, then theoretically it cannot 
achieve exactly-once semantics, right? Since a single task may trigger multiple 
Stream Loads.
   
   2. My understanding is that this situation only occurs when `retries > 0` is 
configured. Because the connector caches the data fetched each time and can 
guarantee that every retry after a Stream Load failure uses the exact same data.
   
   3. If label reuse is required, it seems an `abort` must be issued before 
each reuse attempt. Otherwise, if the label is still in the `PREPARE` state, 
the write will never succeed. You may refer to: 
https://github.com/apache/doris-flink-connector/pull/523
   
   Lastly, based on the first point, I feel the current logic is acceptable for 
most scenarios. So I'd like to understand: in what kind of scenario did you 
encounter the duplicate write issue?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [Fix](write) Make stream load retry exactly-once by reusing the label [doris-spark-connector]

Reply via email to