Re: [DISCUSS] KIP-1150 Diskless Topics

Stanislav Kozlovski Sun, 20 Apr 2025 13:04:22 -0700

This is an amazing initiative. Huge kudos for driving it. We should incorporate 
it one way or another.


I have a suggestion I'd like to hear your thoughts on. I'm cognizant of the 
effort required for KIP-1150 so I don't necessarily want to increase the scope 
- but thinking about this early on can help design later on, plus shape the 
motivation.

The idea is to introduce support for replicationless acks=1 writes. This would 
be very similar to how AutoMQ's WAL+S3 feature works, as far as I understand it.

Could we have Diskless Brokers serve acks=1 produce requests by immediately 
persisting the data on disk (not sure if we should use fsync or not), 
responding to the request, and then still asynchronously batching said data 
with regular acks=all data via the "diskless.append.commit.interval.ms"/ 
"diskless.append.buffer.max.bytes" configs?

If I'm not mistaken, this would offer very similar guarantees as today's acks=1 
requests, where a period of low durability exists b/w the time the leader 
persists to its local disk and the time all followers persist to their disk. 
Granted, in traditional Kafka this period is probably no more than a hundred 
milliseconds, and here it'd be at least 2x higher. But I believe that given the 
major savings, many acks=1 users will be happy to make the tradeoff.

While on the topic of cost, I hastily ran some cost calculations and found that 
the KIP should reduce replication costs by more than 80x. 
(https://topicpartition.io/blog/kip-1150-diskless-topics-in-apache-kafka). 
There may be some errors there as the batch coordinator RPC and merging isn't 
fully fleshed out - but I believe it's directionally correct. It may be worth 
to add that to the motivation in one way or another - so as to be able to 
quantify the numbers.

Best,
Stanislav

On 2025/04/19 11:02:30 Ivan Yurchenko wrote:
> Hi Ziming,
> 
> > 1. Is this feature available by just a minor adjust of config or it will 
> > intrude current code heavily, say, AutoMq is 100% compatible with Kafka and 
> > doesn’t intrude the code heavily 
> 
> If we speak about the part visible to the user, we expect:
>  1. Minimal changes to the client code (with potential fallback with even 0 
> changes for older clients).
>  2. A limited set of new configurations for broker and topics.
> Otherwise, this should be a perfectly normal Apache Kafka.
> 
> > 2. Though we are not discussing implement details, it’s worth giving some 
> > high-level architecture ideas, and it’s better to compare with AutoMq like 
> > systems.
> 
> There's quite a bit of high-level architecture in a sub-KIP-1163 [1].
> We didn't do comparison to AutoMQ (to the best of our knowledge, they have a 
> fairly different approach), but if this helps the community to get the idea 
> then sure, we should do this.
> 
> > 3. What we will provide through it, I think we will just provide a common 
> > interface and put implementations in another repos, just as we did for 
> > Kafka Connect and Kafka Tired Storage.
> 
> This is true for the component that does CRUD operations on object storage. 
> However, for the batch coordinator we would like to provide a decent 
> out-of-the-box self-contained (i.e. no external deps like database) 
> implementation that many Kafka users who don't have challenging scaling 
> requirements would benefit from. There's the sub-KIP-1164 [2] for this.
> 
> > 4. How to deal with KRaft related protocol, since metadata topic is managed 
> > differently with __cluster_metadata, through this KIP, will we align the 
> > gap between __cluster_metadata  and data topics by put metadata in an 
> > object storage? if so, there will be no standby controller? since standby 
> > controller is the __cluster_metadata followers and there will be no 
> > followers.
> 
> The current plan is to not directly work with the KRaft and 
> __cluster_metadata. What we need from KRaft is 3 types of events: 
> topic/partition creation, topic deletion, and topic configuration changes 
> (with the possibility to limit this set to topic deletion only). We think 
> that'd be enough if we have a "bridge" that watches for these events in 
> __cluster_metadata and reflects them in the batch coordinator (basically, by 
> sending requests).
> Does this answer the question or maybe I misunderstood?
> 
> Best,
> Ivan
> 
> [1] 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1163%3A+Diskless+Core
> [2] 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1164%3A+Topic+Based+Batch+Coordinator
> 
> On Fri, Apr 18, 2025, at 12:42, Ziming Deng wrote:
> > Hi Josep,
> > 
> > This would be a fascinating feature, some well known Kafka users are using 
> > Kafka in a cloud-native env. As for as I know, there are already some 
> > secondary development version Kafka which provide this feature, for 
> > example, I am using AutoMq(https://github.com/AutoMQ/automq) in my 
> > environment, which significantly helped ms reduced the cost, so I think 
> > it’s worthwhile to clarify some related details:
> > 1. Is this feature available by just a minor adjust of config or it will 
> > intrude current code heavily, say, AutoMq is 100% compatible with Kafka and 
> > doesn’t intrude the code heavily 
> > 2. Though we are not discussing implement details, it’s worth giving some 
> > high-level architecture ideas, and it’s better to compare with AutoMq like 
> > systems.
> > 3. What we will provide through it, I think we will just provide a common 
> > interface and put implementations in another repos, just as we did for 
> > Kafka Connect and Kafka Tired Storage.
> > 4. How to deal with KRaft related protocol, since metadata topic is managed 
> > differently with __cluster_metadata, through this KIP, will we align the 
> > gap between __cluster_metadata  and data topics by put metadata in an 
> > object storage? if so, there will be no standby controller? since standby 
> > controller is the __cluster_metadata followers and there will be no 
> > followers.
> > 
> > — 
> > Ziming
> > 
> > > On Apr 16, 2025, at 19:58, Josep Prat <[email protected]> wrote:
> > > 
> > > Hi Kafka Devs!
> > > 
> > > We want to start a new KIP discussion about introducing a new type of
> > > topics that would make use of Object Storage as the primary source of
> > > storage. However, as this KIP is big we decided to split it into multiple
> > > related KIPs.
> > > We have the motivational KIP-1150 (
> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1150%3A+Diskless+Topics)
> > > that aims to discuss if Apache Kafka should aim to have this type of
> > > feature at all. This KIP doesn't go onto details on how to implement it.
> > > This follows the same approach used when we discussed KRaft.
> > > 
> > > But as we know that it is sometimes really hard to discuss on that meta
> > > level, we also created several sub-kips (linked in KIP-1150) that offer an
> > > implementation of this feature.
> > > 
> > > We kindly ask you to use the proper DISCUSS threads for each type of
> > > concern and keep this one to discuss whether Apache Kafka wants to have
> > > this feature or not.
> > > 
> > > Thanks in advance on behalf of all the authors of this KIP.
> > > 
> > > ------------------
> > > Josep Prat
> > > Open Source Engineering Director, Aiven
> > > [email protected]   |   +491715557497 | aiven.io
> > > Aiven Deutschland GmbH
> > > Alexanderufer 3-7, 10117 Berlin
> > > Geschäftsführer: Oskari Saarenmaa, Hannu Valtonen,
> > > Anna Richardson, Kenneth Chen
> > > Amtsgericht Charlottenburg, HRB 209739 B
> > 
> > 
>

Re: [DISCUSS] KIP-1150 Diskless Topics

Reply via email to