Re: KIP-1164: Batch Coordinator & Exactly-Once Semantics (EOS) Data Safety

Josep Prat Sun, 01 Mar 2026 23:59:23 -0800

Hi Viquar,

Thanks for your comments and participating in the KIP process. In order for 
your comments to be registered properly, you have to use the proper DISCUSS 
threads for each KIP. This way, we have a singular centralized archive for 
discussions, and votes.
For KIP-1164, you can find the existing DISCUSS thread here [1]. The detailed 
process for Kafka Improvement Proposals is also available for reference [2].



[1] https://lists.apache.org/thread/m9l6lbqv2cffxtz5frypylmqjd7bsqoz
[2] 
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals#KafkaImprovementProposals-Process

Best,

Josep Prat,
PMC Member of Apache Kafka.

On 2026/03/01 20:26:24 vaquar khan wrote:
> Hi Everyone,
> 
> Following up on the KIP-1150 vote thread, I'm moving my questions regarding
> Exactly-Once Semantics (EOS) over here since they fall squarely into
> KIP-1164's domain.
> 
> Decoupling storage is a massive win for cross-AZ costs, but shifting to a
> leaderless data plane inherently decentralizes the transaction state
> machine. To ensure we don't accidentally introduce split-brain scenarios or
> break read_committed isolation, we need to explicitly define the
> synchronization barriers.
> 
> Here are three areas where the current design needs tighter specs, along
> with some proposed architectural patterns to solve them:
> 
> 1. LSO Calculation via Materialized Views: In standard Kafka, the partition
> leader is the single source of truth. It tracks in-flight transactions via
> the ProducerStateManager and computes the Last Stable Offset (LSO) in
> memory. With diskless, the Batch Coordinator takes over this role.
> 
> The Gap: If the Batch Coordinator is handling LSO for a huge number of
> multiplexed partitions, it risks becoming a severe bottleneck.
> 
> Proposed Design: I recommend we explicitly frame the  _diskless-metadata
> topic as an immutable Event Store. The coordinator's embedded SQLite
> database should act purely as a materialized view (projection) over this
> event stream. This projection would maintain a continuously updated index
> of active PIDs, allowing us to dynamically resolve the LSO in O(1) time
> without requiring the coordinator to scan unbounded transaction logs.
> 
> 2. Cross-Coordinator RPC & The Commit Barrier:  When the Transaction
> Coordinator (TC) decides to commit, it needs to verify that all data
> batches for that transaction epoch are actually in place and sequenced.
> 
> The Gap: The KIP currently lacks a defined RPC handshake between the TC and
> the Batch Coordinator. What happens if a CommitBatchCoordinates call is
> still in flight when the TC tries to write the commit marker?
> 
> Proposed Design: We need to explicitly document a strict "Commit Barrier."
> Before writing the commit marker, the Batch Coordinator must
> deterministically verify it has received contiguous sequence numbers for
> the whole epoch. If there are pending asynchronous payloads, the commit
> marker must be blocked at this barrier until they resolve or definitively
> time out.
> 
> 3. The Zombie Broker Problem & Fencing Tokens: This is the edge case that
> worries me the most. Look into this: a broker uploads a batch to S3, but
> then gets hit with a severe GC pause before it can send the metadata commit
> to the Batch Coordinator. Meanwhile, the transaction timeout and the TC
> rolls the epoch forward.
> 
> The Gap: When the broker finally wakes up, it sends its delayed metadata
> commit. If the Batch Coordinator accepts it, we've just merged stale data
> into a transaction that's already been marked as aborted or committed ;a
> direct EOS violation.
> 
> Proposed Design: Probabilistic timeouts won't fix this; we need
> deterministic correctness. Every metadata commit should include a monotonic
> BrokerEpoch acting as a fencing token. The Batch Coordinator must validate
> this token against the latest known cluster state and immediately reject
> anything from a stale epoch.
> 
> Locking down these public interfaces and state transitions in the text will
> give the community the confidence needed to implement this safely.
> 
> Happy to dig into the code or discuss further if it helps clarify any of
> this.looking forward to hearing your thoughts on how we handle these
> synchronization barriers.
> 
> 
> Regards,
> Viquar Khan
> *Linkedin *-https://www.linkedin.com/in/vaquar-khan-b695577/
> *Book *-
> https://us.amazon.com/stores/Vaquar-Khan/author/B0DMJCG9W6?ref=ap_rdr&shoppingPortalEnabled=true
> *GitBook*-https://vaquarkhan.github.io/microservices-recipes-a-free-gitbook/
> *Stack *-https://stackoverflow.com/users/4812170/vaquar-khan
> *github*-https://github.com/vaquarkhan
>

Re: KIP-1164: Batch Coordinator & Exactly-Once Semantics (EOS) Data Safety

Reply via email to