Hi, Luke,

Thanks for starting the discussion. I took a look at all three proposals
and the following is my assessment.

KIP-1150 (diskless):
Pros:
* has the most benefits to the users.
-- most complete saving of cross zone network cost (enabled by leader less
design)
-- better durability (by leveraging block storage)
-- best scalability (by separating data from the metadata)
* clean architecture (no unnatural intrusive changes to existing code base)
Cons: large effort, but arguable this is what's needed to build a true
cloud native architecture

KIP-1176 (tier active segment)
Pros:
* limited benefits to the users
--saving of cross zone network cost (limited saving on the producer side)
* small effort
Cons:
* the current availability story is weak
* it's not clear if the effort is still small once details on correctness,
cost, cleanness are figured out

KIP-1183 (share storage)
Pros:
* moderate benefits to the users
-- saving of cross zone network cost (limited saving on the producer side
and the consumer side)
-- better durability (by leveraging block storage)
-- improved scalability
Cons:
* weaker availability (no hot standby)
* scalability not as good as KIP-1150
* effort to build the plugin is too large

Thanks,

Jun

On Wed, Aug 6, 2025 at 12:58 AM Luke Chen <show...@gmail.com> wrote:

> Hi Josep,
>
> Thanks for the update.
>
> > Luke, thank you for being proactive and caring about this topic!
> I believe many community users are also caring about this topic! :)
>
> Look forward to seeing the updated KIP!
>
>
> Hi Stanislav,
>
> Yes, it'd be good for the community to decide which way we want to go,
> Leaderless or leader-based is absolutely one of the decisions.
> And yes, more than one KIP is also good to me. It's just that we need a way
> to move them forward.
> Otherwise, suppose one of the KIPs is ready for voting, we can anticipate
> requests to wait for the other two related KIPs.
> Any good suggestions?
>
> Hi Xinyu,
>
> Thanks for the reply.
> Look forward to seeing the updated KIP!
>
> > If the community plans to adopt a leaderless architecture, will the focus
> be on a complete transition to leaderless, or will both architectures
> coexist in the long term?
>
> I don't think we will abandon the leader-based design as a lot of users are
> still relying on it.
> Besides, KIP-1150 also claims the existing leader-based protocol works as
> usual.
> So, I think they should coexist in the long term.
>
>
> Thank you.
> Luke
>
>
> On Wed, Aug 6, 2025 at 10:13 AM Xinyu Zhou <yu...@apache.org> wrote:
>
> > Hi Luke,
> >
> > Thank you for creating this dedicated thread; we definitely need a space
> to
> > discuss future steps for these topics. I apologize for my delay on
> KIP-1183
> > and will provide more details in the coming weeks.
> >
> > I agree with Stanislav that we should first focus on the community's
> > direction. Specifically, should we consider introducing a leaderless
> > architecture to Kafka, given that it currently relies on a partitioned,
> > leader-based model?
> >
> > From my own perspective, I’m particularly interested in how Leaderless
> and
> > Leader-based architectures differ when it comes to handling data
> > locality—which directly affects batching and fetch efficiency—and in the
> > way core features are implemented. For instance, ordering, compaction,
> > transactions, idempotent producers, and queues all have to be realized on
> > the Coordinator in a Leaderless design, whereas in a Leader-based design
> > they are handled by the Leader Partition.
> >
> > If the community plans to adopt a leaderless architecture, will the focus
> > be on a complete transition to leaderless, or will both architectures
> > coexist in the long term?
> >
> > I welcome discussions on this topic and am eager to hear diverse
> opinions.
> >
> > Regards,
> > Xinyu
> >
> > On Wed, Aug 6, 2025 at 3:05 AM Stanislav Kozlovski <
> > stanislavkozlov...@apache.org> wrote:
> >
> > > Thank you Luke for this wonderful summary and taking initiative.
> > >
> > > To me, it seems like a large differentiator from KIP-1150 and others is
> > > the leaderless design. The other two don’t allow for it.
> > >
> > > It sounds productive to focus the discussion on whether the leaderless
> > > design is worth it on top of the replication cost savings.
> > >
> > > I’m of the opinion that it’s worth pursuing - both for the truly zero
> > > network cost (no producer cross az) but perhaps even more importantly
> the
> > > zero state architecture that promises to significantly simplify
> > operations,
> > > including auto scaling brokers and scaling throughput per partition
> > >
> > > It would be great if the folks at Aiven could address the concerns
> > > regarding queue and transactions support. I’m not of the opinion that
> > these
> > > things need to ship with v1, but it would be wise to ensure nothing in
> > the
> > > architecture blocks these features from being shipped in the future
> > >
> > > KIP-1176 is also very cool, addressing the acks=1 case will still be
> > > necessary. I think it’s a necessary feature to implement, but I’d be
> > > disappointed if that’s the only diskless solution the community agrees
> > on.
> > >
> > > A good path, if possible, may be to merge KIP-1150 and KIP-1176.
> > >
> > > If instead the community decides leaderless isn’t necessary, then
> > KIP-1183
> > > seems fit.
> > >
> > > That’s my opinion. Happy to hear if anyone disagrees.
> > >
> > > On 2025/08/05 14:30:45 Josep Prat wrote:
> > > > Hi Luke and community!
> > > >
> > > > Luke, thank you for being proactive and caring about this topic!
> > > >
> > > > In the meantime we have been keeping ourselves busy pushing our
> > > > implementation of KIP-1150 to production to validate our assumptions
> > and
> > > > confirm its strengths while discovering its weaknesses.
> > > > Now, after gathering some experience running it, we are (as I'm
> writing
> > > > this, gathered in the same room) working on an improved proposal for
> > > > KIP-1150 that also addresses the concerns from the community.
> > > > We expect to share the updated KIP in the next couple of weeks.
> > > >
> > > > We apologize for the recent period of silence and are committed to
> more
> > > > regular communication as we move forward.
> > > >
> > > > Best,
> > > >
> > > >
> > > > On Tue, Aug 5, 2025 at 10:31 AM Luke Chen <show...@gmail.com> wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > The Kafka community is currently seeing an unprecedented situation
> > with
> > > > > three KIPs (KIP-1150, IP-1176, KIP-1183) simultaneously addressing
> > the
> > > same
> > > > > challenge of high replication costs when running Kafka across
> > multiple
> > > > > cloud availability zones. Each KIP offers a different solution to
> > this
> > > > > issue. While diversity of innovative ideas is a key strength of
> > > open-source
> > > > > projects, it creates a burden for reviewers and users who must
> > compare
> > > and
> > > > > comment on multiple proposals simultaneously. Furthermore,
> discussion
> > > > > around the three KIPs has stalled for over two months now. This
> could
> > > be
> > > > > due to the authors being hesitant to proceed due to the existence
> of
> > > > > alternative, potentially conflicting, solutions. Addressing
> > replication
> > > > > cost is a key concern of Kafka’s userbase and we should try to move
> > the
> > > > > conversation forward if we can.
> > > > >
> > > > > From what I understand, these three KIPs are not mutually
> exclusive.
> > > But
> > > > > adopting all three KIPs in the community might not be what we
> expect.
> > > Thus,
> > > > > I would like to *start a discussion on how we could move the
> > > conversation
> > > > > forward*.
> > > > >
> > > > > To save time for the KIP readers/reviewers, I have created this
> > > document
> > > > > <
> > > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/The+Path+Forward+for+Saving+Cross-AZ+Replication+Costs+KIPs
> > > > > >[1]
> > > > > to help summarize each of the KIPs and describe their current
> status.
> > > *Hope
> > > > > to get some suggestions/feedback from the community*.
> > > > >
> > > > >
> > > > > [1]
> > > > >
> > > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/The+Path+Forward+for+Saving+Cross-AZ+Replication+Costs+KIPs
> > > > >
> > > > > KIP-1150:
> > > > >
> > > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1150%3A+Diskless+Topics
> > > > > KIP-1176
> > > > > <
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1150%3A+Diskless+TopicsKIP-1176
> > > >
> > > > > :
> > > > >
> > > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1176%3A+Tiered+Storage+for+Active+Log+Segment
> > > > > KIP-1183
> > > > > <
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1176%3A+Tiered+Storage+for+Active+Log+SegmentKIP-1183
> > > >
> > > > > :
> > > > >
> > > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1183%3A+Unified+Shared+Storage
> > > > >
> > > > >
> > > > > Thank you.
> > > > > Luke
> > > > >
> > > >
> > > >
> > > > --
> > > > [image: Aiven] <https://www.aiven.io>
> > > >
> > > > *Josep Prat*
> > > > Sr. Engineering Director, Streaming Services, *Aiven*
> > > > josep.p...@aiven.io   |   +491715557497
> > > > aiven.io <https://www.aiven.io>   |   <
> > > https://www.facebook.com/aivencloud>
> > > >   <https://www.linkedin.com/company/aiven/>   <
> > > https://twitter.com/aiven_io>
> > > > *Aiven Deutschland GmbH*
> > > > Alexanderufer 3-7, 10117 Berlin
> > > >
> > > > Geschäftsführer: Oskari Saarenmaa, Hannu Valtonen,
> > > >
> > > >  Kenneth Chen
> > > > Amtsgericht Charlottenburg, HRB 209739 B
> > > >
> > >
> >
>

Reply via email to