JNSimba commented on PR #364: URL: https://github.com/apache/doris-spark-connector/pull/364#issuecomment-4785926291
Thanks for your contribution. I have a few questions: 1. If the Spark Job does not enable 2PC, then theoretically it cannot achieve exactly-once semantics, right? Since a single task may trigger multiple Stream Loads. 2. My understanding is that this situation only occurs when `retries > 0` is configured. Because the connector caches the data fetched each time and can guarantee that every retry after a Stream Load failure uses the exact same data. 3. If label reuse is required, it seems an `abort` must be issued before each reuse attempt. Otherwise, if the label is still in the `PREPARE` state, the write will never succeed. You may refer to: https://github.com/apache/doris-flink-connector/pull/523 Lastly, based on the first point, I feel the current logic is acceptable for most scenarios. So I'd like to understand: in what kind of scenario did you encounter the duplicate write issue? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
