Hi, Anatolii, Thanks for the KIP. The link you posted for KIP-1150 seems incorrect and it points to KIP-1163. Otherwise, +1.
Jun On Wed, Feb 25, 2026 at 2:59 PM vaquar khan <[email protected]> wrote: > Fair point, Chris. I agree with that architectural boundary. KIP-1150 > successfully sets the high-level mandate , and we can rigorously tackle the > exact EOS and RPC mechanics over in the KIP-1164 thread . > > Andrew, I am fully aligned with you on the massive operational value of > eliminating those cross-AZ replication costs. It is absolutely the right > strategic direction for Kafka. > > Since my initial concerns on the storage side are resolved, and we are > aligned on where the transactional interfaces will be finalized, I am > officially withdrawing my objection. > +1 (non-binding) for KIP-1150. > > I will migrate my open questions over to the KIP-1164 discussion thread so > we can lock down the data safety details there. > > Regards, > Vaquar Khan > > On Wed, 25 Feb 2026 at 15:24, Chris Egerton <[email protected]> > wrote: > > > Hi Vaquar, > > > > > Let me know what you guys think about locking down the text for these > > interfaces. > > > > I think this KIP has the appropriate level of detail and any concerns > about > > EOS can be addressed in the relevant sub-KIP. > > > > Chris > > > > On Wed, Feb 25, 2026 at 4:20 PM vaquar khan <[email protected]> > wrote: > > > > > Hi everyone, > > > > > > First off, thanks to the authors for the Feb 12th updates to KIP-1163 . > > > Adding the periodic reconciliation loop clears up my concerns about the > > > orphaned "Upload-then-Commit" segments, so I'm officially withdrawing > my > > > objection on the storage leak issue . > > > > > > Chris and Greg- since you both mentioned digging into the 1164 > details, I > > > wanted to pick your brains on how Exactly-Once Semantics (EOS) is going > > to > > > safely operate here. In standard Kafka, the Partition Leader is our > > single > > > serialization point. It receives the data, tracks ongoing transactions > > via > > > the ProducerStateManager, and calculates the Last Stable Offset (LSO) > > > locally . Since KIP-1150 removes the leader, the Batch Coordinator > takes > > > over. But as I read through the current text, a few critical > > > synchronization barriers seem to be missing to me: > > > > > > 1. LSO Calculation: How exactly will the Batch Coordinator maintain and > > > calculate the LSO? Justine Olshan brought this up earlier too . Will > the > > > coordinator run its own ProducerStateManager to track ongoing > > transactions, > > > or is there a totally different state machine planned? > > > > > > 2. RPC Protocol: What's the exact synchronization protocol between the > > > legacy Transaction Coordinator and the new Batch Coordinator? When the > > Txn > > > Coordinator sends a commit marker, how does the Batch Coordinator > > actually > > > verify it has received all the prerequisite data batches for that > > specific > > > transaction epoch? > > > > > > 3. Delayed Data Race Condition: Let's say a broker hits a GC pause > right > > > *after > > > *uploading a batch to object storage, but *before* committing the > > > coordinates . If the transaction commit marker arrives at the > Coordinator > > > first, what happens? Does the Coordinator wait? If not, couldn't the > > > transaction commit with missing data, completely violating > read_committed > > > isolation? > > > > > > The KIP vaguely mentions *transactional checks* but leaves the actual > > > commit protocol and public interfaces undefined right now . I'm not > > saying > > > the design itself is broken, but I really think myself and others need > to > > > see these RPC flows explicitly documented before we implement and > adopt > > > this. Otherwise, we risk baking in some severe data isolation headaches > > > down the line. > > > > > > Let me know what you guys think about locking down the text for these > > > interfaces. > > > > > > Regards, > > > Vaquar Khan > > > > > > On Wed, 25 Feb 2026 at 10:33, Greg Harris via dev < > [email protected]> > > > wrote: > > > > > > > Hey all, > > > > > > > > I'm excited to discuss more details in 1163 and 1164 with everyone. > > > > > > > > +1 (binding) > > > > > > > > Thanks! > > > > Greg > > > > > > > > On Wed, Feb 25, 2026 at 1:08 AM Anatolii Popov via dev < > > > > [email protected]> > > > > wrote: > > > > > > > > > Hi all, > > > > > > > > > > Given the importance of this KIP, we want to keep the vote open > for a > > > few > > > > > more days to give time to people who had comments in the DISCUSS > > thread > > > > to > > > > > cast their vote if they want. > > > > > > > > > > On Wed, Feb 25, 2026 at 10:47 AM Josep Prat via dev < > > > > [email protected]> > > > > > wrote: > > > > > > > > > > > Hi all, > > > > > > As a co-author of the KIP, I want to explicitly cast my vote for > > this > > > > > KIP. > > > > > > > > > > > > +1 (binding) > > > > > > > > > > > > > > > > > > On Wed, Feb 25, 2026 at 9:02 AM Luke Chen <[email protected]> > > wrote: > > > > > > > > > > > > > I've re-read KIP-1150, and still agree this is what we need for > > > > Apache > > > > > > > Kafka. > > > > > > > > > > > > > > +1 (binding) from me. > > > > > > > > > > > > > > Thank you, > > > > > > > Luke > > > > > > > > > > > > > > On Wed, Feb 25, 2026 at 12:10 PM Chris Egerton < > > > > > [email protected]> > > > > > > > wrote: > > > > > > > > > > > > > >> Hi all, > > > > > > >> > > > > > > >> Thanks for the KIP. I've reviewed 1150, 1163, and 1164, as > well > > as > > > > the > > > > > > >> relevant discussion threads. I may have granular comments > about > > > 1163 > > > > > and > > > > > > >> 1164 but the overall approach suggested in 1150 looks good to > > me. > > > I > > > > > > >> especially like that the approach covers two main pain points > of > > > > > > operating > > > > > > >> and paying for Kafka today: it allows cross-AZ traffic to be > > > reduced > > > > > > (even > > > > > > >> eliminated in some cases), and it also allows local disk usage > > by > > > > > > brokers > > > > > > >> to be reduced (if operators opt for a small local cache on > > > follower > > > > > > >> brokers > > > > > > >> for non-tiered segments). > > > > > > >> > > > > > > >> +1 (binding) > > > > > > >> > > > > > > >> Cheers, > > > > > > >> > > > > > > >> Chris > > > > > > >> > > > > > > >> On Mon, Jan 26, 2026 at 3:36 PM vaquar khan < > > > [email protected]> > > > > > > >> wrote: > > > > > > >> > > > > > > >> > Hi Josep, > > > > > > >> > > > > > > > >> > Thank you for the detailed response. I appreciate the > > > > clarification > > > > > > >> > regarding the distinction between the Inkless POC and the > KIP > > > > > design. > > > > > > >> > > > > > > > >> > However, my objection is not based on temporary bugs in the > > > fork, > > > > > but > > > > > > >> *on > > > > > > >> > architectural gaps in the KIPs themselves* that these > > > > implementation > > > > > > >> issues > > > > > > >> > highlighted. If we are voting to approve the design, the > > design > > > > > > >> documents > > > > > > >> > must be structurally complete regarding data safety. > > > > > > >> > > > > > > > >> > *1. Regarding Storage Leaks (The Missing Design)* You > > mentioned > > > > that > > > > > > >> > cleanup logic "can be defined later." However, KIP-1163 > > > explicitly > > > > > > >> > delegates this responsibility to a separate process, and > > > KIP-1165 > > > > > > >> (Object > > > > > > >> > Compaction/GC) is currently marked as "Discarded" in the > wiki. > > > > > > >> > > > > > > > >> > We cannot vote to approve a storage engine that has no > > specified > > > > > > >> mechanism > > > > > > >> > for garbage collection. The "Upload-then-Commit" pattern > > > described > > > > > in > > > > > > >> > KIP-1163 structurally creates orphaned segments during > broker > > > > > > failures. > > > > > > >> > Without an active KIP defining the reconciliation protocol > > > (since > > > > > > >> KIP-1165 > > > > > > >> > was withdrawn), the proposal effectively describes a system > > with > > > > > > >> unbounded > > > > > > >> > storage growth during failure modes. This is a blocking > design > > > > gap, > > > > > > not > > > > > > >> an > > > > > > >> > implementation detail. > > > > > > >> > > > > > > > >> > *2. Regarding EOS (The Coordinator Synchronization Gap)* > This > > is > > > > > not a > > > > > > >> > misunderstanding of standard Kafka transactions; it is a > > > critique > > > > of > > > > > > how > > > > > > >> > KIP-1150 changes them. Standard EOS relies on the Partition > > > Leader > > > > > to > > > > > > >> > sequence markers and calculate the LSO (Last Stable Offset) > in > > > > > memory. > > > > > > >> > KIP-1150 removes the Leader. > > > > > > >> > > > > > > > >> > KIP-1164 (Batch Coordinator) must explicitly define the RPC > > flow > > > > > > between > > > > > > >> > the Transaction Coordinator and the Batch Coordinator to > > replace > > > > the > > > > > > >> > leader's role. Currently, the KIP does not specify how the > > > system > > > > > > >> prevents > > > > > > >> > a "Split Brain" scenario where a consumer reads ahead of a > > > > > transaction > > > > > > >> > marker that hasn't yet been sequenced by the Batch > > Coordinator. > > > > This > > > > > > is > > > > > > >> a > > > > > > >> > protocol-level correctness issue that must be resolved in > the > > > text > > > > > > >> before > > > > > > >> > adoption. > > > > > > >> > > > > > > > >> > Please note - I am maintaining my objection based on missing > > > > > > >> > specifications, not code bugs. > > > > > > >> > > > > > > > >> > I respectfully request that we pause the vote until: > > > > > > >> > > > > > > > >> > A valid design for Garbage Collection (replacing the > > > discarded > > > > > > >> > KIP-1165) is added to the proposal. > > > > > > >> > > > > > > > >> > The Transaction/LSO synchronization protocol is > explicitly > > > > > > >> documented > > > > > > >> > in KIP-1164. > > > > > > >> > > > > > > > >> > Regards, > > > > > > >> > > > > > > > >> > Vaquar Khan > > > > > > >> > Sr Data Architect > > > > > > >> > https://www.linkedin.com/in/vaquar-khan-b695577/ > > > > > > >> > > > > > > > >> > > > > > > > > > > > > > > > > > > > -- > > > > > > [image: Aiven] <https://www.aiven.io> > > > > > > > > > > > > *Josep Prat* > > > > > > Sr. Engineering Director, Streaming Services, *Aiven* > > > > > > [email protected] | +491715557497 > > > > > > aiven.io <https://www.aiven.io> | < > > > > > https://www.facebook.com/aivencloud > > > > > > > > > > > > > <https://www.linkedin.com/company/aiven/> < > > > > > > https://twitter.com/aiven_io> > > > > > > *Aiven Deutschland GmbH* > > > > > > Alexanderufer 3-7, 10117 Berlin > > > > > > > > > > > > Geschäftsführer: Oskari Saarenmaa, Kenneth Chen > > > > > > Amtsgericht Charlottenburg, HRB 209739 B > > > > > > > > > > > > > > > > > > > > > -- > > > > > Anatolii Popov > > > > > Senior Software Developer, *Aiven OY* > > > > > m: +358505126242 > > > > > w: aiven.io e: [email protected] > > > > > <https://www.facebook.com/aivencloud> > > > > > <https://www.linkedin.com/company/aiven/> < > > > > https://twitter.com/aiven_io> > > > > > > > > > > > > > > >
