+1 (binding)
> Satish Duggana <[email protected]> 於 2026年2月28日 上午11:00 寫道: > > Thanks for the KIP. > I've reviewed the updated KIP and agree with the motivation behind > KIP-1150, overall LGTM. > It seems KIP-1163 and KIP-1164 require more details, which we can discuss > in those respective threads. > > +1(binding) for KIP-1150. > > ~Satish. > >> On Fri, 27 Feb 2026 at 23:28, Jun Rao via dev <[email protected]> wrote: >> >> Hi, Anatolii, >> >> Thanks for the KIP. The link you posted for KIP-1150 seems incorrect and it >> points to KIP-1163. Otherwise, +1. >> >> Jun >> >>> On Wed, Feb 25, 2026 at 2:59 PM vaquar khan <[email protected]> wrote: >>> >>> Fair point, Chris. I agree with that architectural boundary. KIP-1150 >>> successfully sets the high-level mandate , and we can rigorously tackle >> the >>> exact EOS and RPC mechanics over in the KIP-1164 thread . >>> >>> Andrew, I am fully aligned with you on the massive operational value of >>> eliminating those cross-AZ replication costs. It is absolutely the right >>> strategic direction for Kafka. >>> >>> Since my initial concerns on the storage side are resolved, and we are >>> aligned on where the transactional interfaces will be finalized, I am >>> officially withdrawing my objection. >>> +1 (non-binding) for KIP-1150. >>> >>> I will migrate my open questions over to the KIP-1164 discussion thread >> so >>> we can lock down the data safety details there. >>> >>> Regards, >>> Vaquar Khan >>> >>> On Wed, 25 Feb 2026 at 15:24, Chris Egerton <[email protected]> >>> wrote: >>> >>>> Hi Vaquar, >>>> >>>>> Let me know what you guys think about locking down the text for these >>>> interfaces. >>>> >>>> I think this KIP has the appropriate level of detail and any concerns >>> about >>>> EOS can be addressed in the relevant sub-KIP. >>>> >>>> Chris >>>> >>>> On Wed, Feb 25, 2026 at 4:20 PM vaquar khan <[email protected]> >>> wrote: >>>> >>>>> Hi everyone, >>>>> >>>>> First off, thanks to the authors for the Feb 12th updates to >> KIP-1163 . >>>>> Adding the periodic reconciliation loop clears up my concerns about >> the >>>>> orphaned "Upload-then-Commit" segments, so I'm officially withdrawing >>> my >>>>> objection on the storage leak issue . >>>>> >>>>> Chris and Greg- since you both mentioned digging into the 1164 >>> details, I >>>>> wanted to pick your brains on how Exactly-Once Semantics (EOS) is >> going >>>> to >>>>> safely operate here. In standard Kafka, the Partition Leader is our >>>> single >>>>> serialization point. It receives the data, tracks ongoing >> transactions >>>> via >>>>> the ProducerStateManager, and calculates the Last Stable Offset (LSO) >>>>> locally . Since KIP-1150 removes the leader, the Batch Coordinator >>> takes >>>>> over. But as I read through the current text, a few critical >>>>> synchronization barriers seem to be missing to me: >>>>> >>>>> 1. LSO Calculation: How exactly will the Batch Coordinator maintain >> and >>>>> calculate the LSO? Justine Olshan brought this up earlier too . Will >>> the >>>>> coordinator run its own ProducerStateManager to track ongoing >>>> transactions, >>>>> or is there a totally different state machine planned? >>>>> >>>>> 2. RPC Protocol: What's the exact synchronization protocol between >> the >>>>> legacy Transaction Coordinator and the new Batch Coordinator? When >> the >>>> Txn >>>>> Coordinator sends a commit marker, how does the Batch Coordinator >>>> actually >>>>> verify it has received all the prerequisite data batches for that >>>> specific >>>>> transaction epoch? >>>>> >>>>> 3. Delayed Data Race Condition: Let's say a broker hits a GC pause >>> right >>>>> *after >>>>> *uploading a batch to object storage, but *before* committing the >>>>> coordinates . If the transaction commit marker arrives at the >>> Coordinator >>>>> first, what happens? Does the Coordinator wait? If not, couldn't the >>>>> transaction commit with missing data, completely violating >>> read_committed >>>>> isolation? >>>>> >>>>> The KIP vaguely mentions *transactional checks* but leaves the actual >>>>> commit protocol and public interfaces undefined right now . I'm not >>>> saying >>>>> the design itself is broken, but I really think myself and others >> need >>> to >>>>> see these RPC flows explicitly documented before we implement and >>> adopt >>>>> this. Otherwise, we risk baking in some severe data isolation >> headaches >>>>> down the line. >>>>> >>>>> Let me know what you guys think about locking down the text for these >>>>> interfaces. >>>>> >>>>> Regards, >>>>> Vaquar Khan >>>>> >>>>> On Wed, 25 Feb 2026 at 10:33, Greg Harris via dev < >>> [email protected]> >>>>> wrote: >>>>> >>>>>> Hey all, >>>>>> >>>>>> I'm excited to discuss more details in 1163 and 1164 with everyone. >>>>>> >>>>>> +1 (binding) >>>>>> >>>>>> Thanks! >>>>>> Greg >>>>>> >>>>>> On Wed, Feb 25, 2026 at 1:08 AM Anatolii Popov via dev < >>>>>> [email protected]> >>>>>> wrote: >>>>>> >>>>>>> Hi all, >>>>>>> >>>>>>> Given the importance of this KIP, we want to keep the vote open >>> for a >>>>> few >>>>>>> more days to give time to people who had comments in the DISCUSS >>>> thread >>>>>> to >>>>>>> cast their vote if they want. >>>>>>> >>>>>>> On Wed, Feb 25, 2026 at 10:47 AM Josep Prat via dev < >>>>>> [email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi all, >>>>>>>> As a co-author of the KIP, I want to explicitly cast my vote >> for >>>> this >>>>>>> KIP. >>>>>>>> >>>>>>>> +1 (binding) >>>>>>>> >>>>>>>> >>>>>>>> On Wed, Feb 25, 2026 at 9:02 AM Luke Chen <[email protected]> >>>> wrote: >>>>>>>> >>>>>>>>> I've re-read KIP-1150, and still agree this is what we need >> for >>>>>> Apache >>>>>>>>> Kafka. >>>>>>>>> >>>>>>>>> +1 (binding) from me. >>>>>>>>> >>>>>>>>> Thank you, >>>>>>>>> Luke >>>>>>>>> >>>>>>>>> On Wed, Feb 25, 2026 at 12:10 PM Chris Egerton < >>>>>>> [email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hi all, >>>>>>>>>> >>>>>>>>>> Thanks for the KIP. I've reviewed 1150, 1163, and 1164, as >>> well >>>> as >>>>>> the >>>>>>>>>> relevant discussion threads. I may have granular comments >>> about >>>>> 1163 >>>>>>> and >>>>>>>>>> 1164 but the overall approach suggested in 1150 looks good >> to >>>> me. >>>>> I >>>>>>>>>> especially like that the approach covers two main pain >> points >>> of >>>>>>>> operating >>>>>>>>>> and paying for Kafka today: it allows cross-AZ traffic to be >>>>> reduced >>>>>>>> (even >>>>>>>>>> eliminated in some cases), and it also allows local disk >> usage >>>> by >>>>>>>> brokers >>>>>>>>>> to be reduced (if operators opt for a small local cache on >>>>> follower >>>>>>>>>> brokers >>>>>>>>>> for non-tiered segments). >>>>>>>>>> >>>>>>>>>> +1 (binding) >>>>>>>>>> >>>>>>>>>> Cheers, >>>>>>>>>> >>>>>>>>>> Chris >>>>>>>>>> >>>>>>>>>> On Mon, Jan 26, 2026 at 3:36 PM vaquar khan < >>>>> [email protected]> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Hi Josep, >>>>>>>>>>> >>>>>>>>>>> Thank you for the detailed response. I appreciate the >>>>>> clarification >>>>>>>>>>> regarding the distinction between the Inkless POC and the >>> KIP >>>>>>> design. >>>>>>>>>>> >>>>>>>>>>> However, my objection is not based on temporary bugs in >> the >>>>> fork, >>>>>>> but >>>>>>>>>> *on >>>>>>>>>>> architectural gaps in the KIPs themselves* that these >>>>>> implementation >>>>>>>>>> issues >>>>>>>>>>> highlighted. If we are voting to approve the design, the >>>> design >>>>>>>>>> documents >>>>>>>>>>> must be structurally complete regarding data safety. >>>>>>>>>>> >>>>>>>>>>> *1. Regarding Storage Leaks (The Missing Design)* You >>>> mentioned >>>>>> that >>>>>>>>>>> cleanup logic "can be defined later." However, KIP-1163 >>>>> explicitly >>>>>>>>>>> delegates this responsibility to a separate process, and >>>>> KIP-1165 >>>>>>>>>> (Object >>>>>>>>>>> Compaction/GC) is currently marked as "Discarded" in the >>> wiki. >>>>>>>>>>> >>>>>>>>>>> We cannot vote to approve a storage engine that has no >>>> specified >>>>>>>>>> mechanism >>>>>>>>>>> for garbage collection. The "Upload-then-Commit" pattern >>>>> described >>>>>>> in >>>>>>>>>>> KIP-1163 structurally creates orphaned segments during >>> broker >>>>>>>> failures. >>>>>>>>>>> Without an active KIP defining the reconciliation protocol >>>>> (since >>>>>>>>>> KIP-1165 >>>>>>>>>>> was withdrawn), the proposal effectively describes a >> system >>>> with >>>>>>>>>> unbounded >>>>>>>>>>> storage growth during failure modes. This is a blocking >>> design >>>>>> gap, >>>>>>>> not >>>>>>>>>> an >>>>>>>>>>> implementation detail. >>>>>>>>>>> >>>>>>>>>>> *2. Regarding EOS (The Coordinator Synchronization Gap)* >>> This >>>> is >>>>>>> not a >>>>>>>>>>> misunderstanding of standard Kafka transactions; it is a >>>>> critique >>>>>> of >>>>>>>> how >>>>>>>>>>> KIP-1150 changes them. Standard EOS relies on the >> Partition >>>>> Leader >>>>>>> to >>>>>>>>>>> sequence markers and calculate the LSO (Last Stable >> Offset) >>> in >>>>>>> memory. >>>>>>>>>>> KIP-1150 removes the Leader. >>>>>>>>>>> >>>>>>>>>>> KIP-1164 (Batch Coordinator) must explicitly define the >> RPC >>>> flow >>>>>>>> between >>>>>>>>>>> the Transaction Coordinator and the Batch Coordinator to >>>> replace >>>>>> the >>>>>>>>>>> leader's role. Currently, the KIP does not specify how the >>>>> system >>>>>>>>>> prevents >>>>>>>>>>> a "Split Brain" scenario where a consumer reads ahead of a >>>>>>> transaction >>>>>>>>>>> marker that hasn't yet been sequenced by the Batch >>>> Coordinator. >>>>>> This >>>>>>>> is >>>>>>>>>> a >>>>>>>>>>> protocol-level correctness issue that must be resolved in >>> the >>>>> text >>>>>>>>>> before >>>>>>>>>>> adoption. >>>>>>>>>>> >>>>>>>>>>> Please note - I am maintaining my objection based on >> missing >>>>>>>>>>> specifications, not code bugs. >>>>>>>>>>> >>>>>>>>>>> I respectfully request that we pause the vote until: >>>>>>>>>>> >>>>>>>>>>> A valid design for Garbage Collection (replacing the >>>>> discarded >>>>>>>>>>> KIP-1165) is added to the proposal. >>>>>>>>>>> >>>>>>>>>>> The Transaction/LSO synchronization protocol is >>> explicitly >>>>>>>>>> documented >>>>>>>>>>> in KIP-1164. >>>>>>>>>>> >>>>>>>>>>> Regards, >>>>>>>>>>> >>>>>>>>>>> Vaquar Khan >>>>>>>>>>> Sr Data Architect >>>>>>>>>>> https://www.linkedin.com/in/vaquar-khan-b695577/ >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> [image: Aiven] <https://www.aiven.io> >>>>>>>> >>>>>>>> *Josep Prat* >>>>>>>> Sr. Engineering Director, Streaming Services, *Aiven* >>>>>>>> [email protected] | +491715557497 >>>>>>>> aiven.io <https://www.aiven.io> | < >>>>>>> https://www.facebook.com/aivencloud >>>>>>>>> >>>>>>>> <https://www.linkedin.com/company/aiven/> < >>>>>>>> https://twitter.com/aiven_io> >>>>>>>> *Aiven Deutschland GmbH* >>>>>>>> Alexanderufer 3-7, 10117 Berlin >>>>>>>> >>>>>>>> Geschäftsführer: Oskari Saarenmaa, Kenneth Chen >>>>>>>> Amtsgericht Charlottenburg, HRB 209739 B >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Anatolii Popov >>>>>>> Senior Software Developer, *Aiven OY* >>>>>>> m: +358505126242 >>>>>>> w: aiven.io e: [email protected] >>>>>>> <https://www.facebook.com/aivencloud> >>>>>>> <https://www.linkedin.com/company/aiven/> < >>>>>> https://twitter.com/aiven_io> >>>>>>> >>>>>> >>>>> >>>> >>> >>
