Hi Luke and all!

I'll be participating in this discussion from the authors' side together with 
Josep and some other colleagues.

> 2. "Write through to object storage, avoiding local disk usage"
> While this title and the goal said no local disk usage, I'd like to make
> sure is it really zero local disk needed?

You're right, this needs clarification. First thing: when we speak about disk, 
we mean broker disk. Data will be stored on object store and most likely there 
is some form of disk underneath, but this storage has different economy and 
performance characteristics (using which is the main focus of the KIP.)

1. For reading/writing/storing data themselves, broker disk is not used. 
There's also no index files and the like.
2. Where metadata is stored, depends on the batch coordinator implementation, 
which is supposed to be pluggable. However, the reference implementation we 
propose in KIP-1164 uses normal Kafka topics, so some broker disk will be used 
for metadata.
3. There's also caching for the read path, which may optionally use disk 
instead of memory.

So, strictly speaking, it's not zero disk. But despite some disk is used, we 
still call the whole approach diskless because the amount stored on broker 
disks is a tiny fraction of the total amount of user data it supports.
Does this make sense to you?

Best,
Ivan



On Thu, Apr 17, 2025, at 14:11, Luke Chen wrote:
> Hi Josep,
> 
> Thanks for the KIP!
> Quite exciting to see this feature brought into Apache Kafka!!!!
> 
> Comments:
> 1. "Permit multi-region active-active topics with automatic failover"
> I didn't see any future work mentioning this. Does it mean, with diskless
> topic MVP, this will work by default?
> 
> 2. "Write through to object storage, avoiding local disk usage"
> While this title and the goal said no local disk usage, I'd like to make
> sure is it really zero local disk needed?
> We might need to clarify it in the KIP.
> 
> Thank you.
> Luke
> 
> On Wed, Apr 16, 2025 at 7:58 PM Josep Prat <josep.p...@aiven.io.invalid>
> wrote:
> 
> > Hi Kafka Devs!
> >
> > We want to start a new KIP discussion about introducing a new type of
> > topics that would make use of Object Storage as the primary source of
> > storage. However, as this KIP is big we decided to split it into multiple
> > related KIPs.
> > We have the motivational KIP-1150 (
> >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1150%3A+Diskless+Topics
> > )
> > that aims to discuss if Apache Kafka should aim to have this type of
> > feature at all. This KIP doesn't go onto details on how to implement it.
> > This follows the same approach used when we discussed KRaft.
> >
> > But as we know that it is sometimes really hard to discuss on that meta
> > level, we also created several sub-kips (linked in KIP-1150) that offer an
> > implementation of this feature.
> >
> > We kindly ask you to use the proper DISCUSS threads for each type of
> > concern and keep this one to discuss whether Apache Kafka wants to have
> > this feature or not.
> >
> > Thanks in advance on behalf of all the authors of this KIP.
> >
> > ------------------
> > Josep Prat
> > Open Source Engineering Director, Aiven
> > josep.p...@aiven.io   |   +491715557497 | aiven.io
> > Aiven Deutschland GmbH
> > Alexanderufer 3-7, 10117 Berlin
> > Geschäftsführer: Oskari Saarenmaa, Hannu Valtonen,
> > Anna Richardson, Kenneth Chen
> > Amtsgericht Charlottenburg, HRB 209739 B
> >
> 

Reply via email to