Hi, Luke, Thanks for starting the discussion. I took a look at all three proposals and the following is my assessment.
KIP-1150 (diskless): Pros: * has the most benefits to the users. -- most complete saving of cross zone network cost (enabled by leader less design) -- better durability (by leveraging block storage) -- best scalability (by separating data from the metadata) * clean architecture (no unnatural intrusive changes to existing code base) Cons: large effort, but arguable this is what's needed to build a true cloud native architecture KIP-1176 (tier active segment) Pros: * limited benefits to the users --saving of cross zone network cost (limited saving on the producer side) * small effort Cons: * the current availability story is weak * it's not clear if the effort is still small once details on correctness, cost, cleanness are figured out KIP-1183 (share storage) Pros: * moderate benefits to the users -- saving of cross zone network cost (limited saving on the producer side and the consumer side) -- better durability (by leveraging block storage) -- improved scalability Cons: * weaker availability (no hot standby) * scalability not as good as KIP-1150 * effort to build the plugin is too large Thanks, Jun On Wed, Aug 6, 2025 at 12:58 AM Luke Chen <show...@gmail.com> wrote: > Hi Josep, > > Thanks for the update. > > > Luke, thank you for being proactive and caring about this topic! > I believe many community users are also caring about this topic! :) > > Look forward to seeing the updated KIP! > > > Hi Stanislav, > > Yes, it'd be good for the community to decide which way we want to go, > Leaderless or leader-based is absolutely one of the decisions. > And yes, more than one KIP is also good to me. It's just that we need a way > to move them forward. > Otherwise, suppose one of the KIPs is ready for voting, we can anticipate > requests to wait for the other two related KIPs. > Any good suggestions? > > Hi Xinyu, > > Thanks for the reply. > Look forward to seeing the updated KIP! > > > If the community plans to adopt a leaderless architecture, will the focus > be on a complete transition to leaderless, or will both architectures > coexist in the long term? > > I don't think we will abandon the leader-based design as a lot of users are > still relying on it. > Besides, KIP-1150 also claims the existing leader-based protocol works as > usual. > So, I think they should coexist in the long term. > > > Thank you. > Luke > > > On Wed, Aug 6, 2025 at 10:13 AM Xinyu Zhou <yu...@apache.org> wrote: > > > Hi Luke, > > > > Thank you for creating this dedicated thread; we definitely need a space > to > > discuss future steps for these topics. I apologize for my delay on > KIP-1183 > > and will provide more details in the coming weeks. > > > > I agree with Stanislav that we should first focus on the community's > > direction. Specifically, should we consider introducing a leaderless > > architecture to Kafka, given that it currently relies on a partitioned, > > leader-based model? > > > > From my own perspective, I’m particularly interested in how Leaderless > and > > Leader-based architectures differ when it comes to handling data > > locality—which directly affects batching and fetch efficiency—and in the > > way core features are implemented. For instance, ordering, compaction, > > transactions, idempotent producers, and queues all have to be realized on > > the Coordinator in a Leaderless design, whereas in a Leader-based design > > they are handled by the Leader Partition. > > > > If the community plans to adopt a leaderless architecture, will the focus > > be on a complete transition to leaderless, or will both architectures > > coexist in the long term? > > > > I welcome discussions on this topic and am eager to hear diverse > opinions. > > > > Regards, > > Xinyu > > > > On Wed, Aug 6, 2025 at 3:05 AM Stanislav Kozlovski < > > stanislavkozlov...@apache.org> wrote: > > > > > Thank you Luke for this wonderful summary and taking initiative. > > > > > > To me, it seems like a large differentiator from KIP-1150 and others is > > > the leaderless design. The other two don’t allow for it. > > > > > > It sounds productive to focus the discussion on whether the leaderless > > > design is worth it on top of the replication cost savings. > > > > > > I’m of the opinion that it’s worth pursuing - both for the truly zero > > > network cost (no producer cross az) but perhaps even more importantly > the > > > zero state architecture that promises to significantly simplify > > operations, > > > including auto scaling brokers and scaling throughput per partition > > > > > > It would be great if the folks at Aiven could address the concerns > > > regarding queue and transactions support. I’m not of the opinion that > > these > > > things need to ship with v1, but it would be wise to ensure nothing in > > the > > > architecture blocks these features from being shipped in the future > > > > > > KIP-1176 is also very cool, addressing the acks=1 case will still be > > > necessary. I think it’s a necessary feature to implement, but I’d be > > > disappointed if that’s the only diskless solution the community agrees > > on. > > > > > > A good path, if possible, may be to merge KIP-1150 and KIP-1176. > > > > > > If instead the community decides leaderless isn’t necessary, then > > KIP-1183 > > > seems fit. > > > > > > That’s my opinion. Happy to hear if anyone disagrees. > > > > > > On 2025/08/05 14:30:45 Josep Prat wrote: > > > > Hi Luke and community! > > > > > > > > Luke, thank you for being proactive and caring about this topic! > > > > > > > > In the meantime we have been keeping ourselves busy pushing our > > > > implementation of KIP-1150 to production to validate our assumptions > > and > > > > confirm its strengths while discovering its weaknesses. > > > > Now, after gathering some experience running it, we are (as I'm > writing > > > > this, gathered in the same room) working on an improved proposal for > > > > KIP-1150 that also addresses the concerns from the community. > > > > We expect to share the updated KIP in the next couple of weeks. > > > > > > > > We apologize for the recent period of silence and are committed to > more > > > > regular communication as we move forward. > > > > > > > > Best, > > > > > > > > > > > > On Tue, Aug 5, 2025 at 10:31 AM Luke Chen <show...@gmail.com> wrote: > > > > > > > > > Hi all, > > > > > > > > > > The Kafka community is currently seeing an unprecedented situation > > with > > > > > three KIPs (KIP-1150, IP-1176, KIP-1183) simultaneously addressing > > the > > > same > > > > > challenge of high replication costs when running Kafka across > > multiple > > > > > cloud availability zones. Each KIP offers a different solution to > > this > > > > > issue. While diversity of innovative ideas is a key strength of > > > open-source > > > > > projects, it creates a burden for reviewers and users who must > > compare > > > and > > > > > comment on multiple proposals simultaneously. Furthermore, > discussion > > > > > around the three KIPs has stalled for over two months now. This > could > > > be > > > > > due to the authors being hesitant to proceed due to the existence > of > > > > > alternative, potentially conflicting, solutions. Addressing > > replication > > > > > cost is a key concern of Kafka’s userbase and we should try to move > > the > > > > > conversation forward if we can. > > > > > > > > > > From what I understand, these three KIPs are not mutually > exclusive. > > > But > > > > > adopting all three KIPs in the community might not be what we > expect. > > > Thus, > > > > > I would like to *start a discussion on how we could move the > > > conversation > > > > > forward*. > > > > > > > > > > To save time for the KIP readers/reviewers, I have created this > > > document > > > > > < > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/The+Path+Forward+for+Saving+Cross-AZ+Replication+Costs+KIPs > > > > > >[1] > > > > > to help summarize each of the KIPs and describe their current > status. > > > *Hope > > > > > to get some suggestions/feedback from the community*. > > > > > > > > > > > > > > > [1] > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/The+Path+Forward+for+Saving+Cross-AZ+Replication+Costs+KIPs > > > > > > > > > > KIP-1150: > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1150%3A+Diskless+Topics > > > > > KIP-1176 > > > > > < > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1150%3A+Diskless+TopicsKIP-1176 > > > > > > > > > : > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1176%3A+Tiered+Storage+for+Active+Log+Segment > > > > > KIP-1183 > > > > > < > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1176%3A+Tiered+Storage+for+Active+Log+SegmentKIP-1183 > > > > > > > > > : > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1183%3A+Unified+Shared+Storage > > > > > > > > > > > > > > > Thank you. > > > > > Luke > > > > > > > > > > > > > > > > > -- > > > > [image: Aiven] <https://www.aiven.io> > > > > > > > > *Josep Prat* > > > > Sr. Engineering Director, Streaming Services, *Aiven* > > > > josep.p...@aiven.io | +491715557497 > > > > aiven.io <https://www.aiven.io> | < > > > https://www.facebook.com/aivencloud> > > > > <https://www.linkedin.com/company/aiven/> < > > > https://twitter.com/aiven_io> > > > > *Aiven Deutschland GmbH* > > > > Alexanderufer 3-7, 10117 Berlin > > > > > > > > Geschäftsführer: Oskari Saarenmaa, Hannu Valtonen, > > > > > > > > Kenneth Chen > > > > Amtsgericht Charlottenburg, HRB 209739 B > > > > > > > > > >