Question about v2 committer guarantees

Scott Sandre Thu, 06 Jun 2024 15:48:38 -0700

Hi there,

I'm working on developing a new version of the fully open-source
Delta-Flink Sink using the Unified Flink Sink V2 APIs and the fully
open-source Delta-Kernel library (link <https://delta.io/blog/delta-kernel/>
for more details about Delta-Kernel).

I have a few questions about the API guarantees provided by Flink's
exactly-once guarantees. For the following questions, you can assume I'm
forcing a single global committer using the `
org.apache.flink.streaming.api.connector.sink2.SupportsPreCommitTopology::addPreCommitTopology`
API and mapping the incoming Committable DataStream to `.global()`.

On to my questions:

1. For a given checkpointId, will the `
org.apache.flink.api.connector.sink2.Committer::commit` API *always* be
called with *all* committables for that checkpointId? Is there any chance
of only *some* of the committables for that checkpointId being delivered,
perhaps due to a network delay, RPC delay, or even a lost end-of-interval
RPC call?

2. If so, will the `org.apache.flink.api.connector.sink2.Committer::commit`
API only ever be called with committables all belonging to the *same*
checkpointId? Or could they belong to multiple checkpointIds?

3. Suppose that my SinkWriters have written and checkpointed their
committables, and now the `Committer::commit` is attempting to persist them
into external state (i.e. the _delta_log for Delta Lake). During this time,
it may be desirable to force a fresh rewrite of the data referenced by the
committables. However, if we fail the Committer, it will just be retried
with the *same* committables due to Flink's exactly-once guarantees and
checkpointing mechanisms. Is it possible to somehow request that the
writers rewrite the data from the previous checkpointId?

Thanks so much for the help! Very excited to contribute another Apache
Flink connector!

Cheers!

--
[image: email_signature_logo_sm]
*Scott Sandre*
*Sr. Software Engineer*
*Delta Ecosystem Team*
*scott.san...@databricks.com <scott.san...@databricks.com>*

Question about v2 committer guarantees

Reply via email to