Re: [DISCUSS] KIP-1150 Diskless Topics

Ivan Yurchenko Sat, 19 Apr 2025 04:04:30 -0700

Hi Ziming,

> 1. Is this feature available by just a minor adjust of config or it will 
> intrude current code heavily, say, AutoMq is 100% compatible with Kafka and 
> doesn’t intrude the code heavily


If we speak about the part visible to the user, we expect:
 1. Minimal changes to the client code (with potential fallback with even 0 
changes for older clients).
 2. A limited set of new configurations for broker and topics.
Otherwise, this should be a perfectly normal Apache Kafka.

> 2. Though we are not discussing implement details, it’s worth giving some 
> high-level architecture ideas, and it’s better to compare with AutoMq like 
> systems.

There's quite a bit of high-level architecture in a sub-KIP-1163 [1].
We didn't do comparison to AutoMQ (to the best of our knowledge, they have a 
fairly different approach), but if this helps the community to get the idea 
then sure, we should do this.

> 3. What we will provide through it, I think we will just provide a common 
> interface and put implementations in another repos, just as we did for Kafka 
> Connect and Kafka Tired Storage.

This is true for the component that does CRUD operations on object storage. 
However, for the batch coordinator we would like to provide a decent 
out-of-the-box self-contained (i.e. no external deps like database) 
implementation that many Kafka users who don't have challenging scaling 
requirements would benefit from. There's the sub-KIP-1164 [2] for this.

> 4. How to deal with KRaft related protocol, since metadata topic is managed 
> differently with __cluster_metadata, through this KIP, will we align the gap 
> between __cluster_metadata  and data topics by put metadata in an object 
> storage? if so, there will be no standby controller? since standby controller 
> is the __cluster_metadata followers and there will be no followers.

The current plan is to not directly work with the KRaft and __cluster_metadata. 
What we need from KRaft is 3 types of events: topic/partition creation, topic 
deletion, and topic configuration changes (with the possibility to limit this 
set to topic deletion only). We think that'd be enough if we have a "bridge" 
that watches for these events in __cluster_metadata and reflects them in the 
batch coordinator (basically, by sending requests).
Does this answer the question or maybe I misunderstood?

Best,
Ivan

[1] https://cwiki.apache.org/confluence/display/KAFKA/KIP-1163%3A+Diskless+Core
[2] 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-1164%3A+Topic+Based+Batch+Coordinator

On Fri, Apr 18, 2025, at 12:42, Ziming Deng wrote:
> Hi Josep,
> 
> This would be a fascinating feature, some well known Kafka users are using 
> Kafka in a cloud-native env. As for as I know, there are already some 
> secondary development version Kafka which provide this feature, for example, 
> I am using AutoMq(https://github.com/AutoMQ/automq) in my environment, which 
> significantly helped ms reduced the cost, so I think it’s worthwhile to 
> clarify some related details:
> 1. Is this feature available by just a minor adjust of config or it will 
> intrude current code heavily, say, AutoMq is 100% compatible with Kafka and 
> doesn’t intrude the code heavily 
> 2. Though we are not discussing implement details, it’s worth giving some 
> high-level architecture ideas, and it’s better to compare with AutoMq like 
> systems.
> 3. What we will provide through it, I think we will just provide a common 
> interface and put implementations in another repos, just as we did for Kafka 
> Connect and Kafka Tired Storage.
> 4. How to deal with KRaft related protocol, since metadata topic is managed 
> differently with __cluster_metadata, through this KIP, will we align the gap 
> between __cluster_metadata  and data topics by put metadata in an object 
> storage? if so, there will be no standby controller? since standby controller 
> is the __cluster_metadata followers and there will be no followers.
> 
> — 
> Ziming
> 
> > On Apr 16, 2025, at 19:58, Josep Prat <[email protected]> wrote:
> > 
> > Hi Kafka Devs!
> > 
> > We want to start a new KIP discussion about introducing a new type of
> > topics that would make use of Object Storage as the primary source of
> > storage. However, as this KIP is big we decided to split it into multiple
> > related KIPs.
> > We have the motivational KIP-1150 (
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1150%3A+Diskless+Topics)
> > that aims to discuss if Apache Kafka should aim to have this type of
> > feature at all. This KIP doesn't go onto details on how to implement it.
> > This follows the same approach used when we discussed KRaft.
> > 
> > But as we know that it is sometimes really hard to discuss on that meta
> > level, we also created several sub-kips (linked in KIP-1150) that offer an
> > implementation of this feature.
> > 
> > We kindly ask you to use the proper DISCUSS threads for each type of
> > concern and keep this one to discuss whether Apache Kafka wants to have
> > this feature or not.
> > 
> > Thanks in advance on behalf of all the authors of this KIP.
> > 
> > ------------------
> > Josep Prat
> > Open Source Engineering Director, Aiven
> > [email protected]   |   +491715557497 | aiven.io
> > Aiven Deutschland GmbH
> > Alexanderufer 3-7, 10117 Berlin
> > Geschäftsführer: Oskari Saarenmaa, Hannu Valtonen,
> > Anna Richardson, Kenneth Chen
> > Amtsgericht Charlottenburg, HRB 209739 B
> 
>

Re: [DISCUSS] KIP-1150 Diskless Topics

Reply via email to