Hi, Giuseppe, Ivan, Thanks for the reply.
JR1. Sounds good. It would be useful to document the benefit from each major Cloud in the KIP. JR2. "The same tiered storage doesn’t support compacted topics and the community seems fine with this, because compacted topics are not often that big to benefit from remote storage." Ideally, I would like tiered storage to support compacted topics. However, at least for each topic that tiered storage is turned on, all existing client APIs are supported. The original KIP has the detailed design on how to support all existing APIs when tiered storage is turned on and the feature is only GAed after all existing APIs are supported. "Transactions and queues are great features of Kafka, but many use cases–especially where diskless shine–may be just fine without them, so it may be unnecessary to delay serving them." This may or may not be true. The thing is that once something is in the public APIs, it's hard to say that it's not used. So, the least controversial path is to support all existing APIs. Our users are trusting us to continue supporting existing APIs and protocols. We can't break that trust. I understand that it takes effort to support all existing APIs. It's fine to deliver the APIs in stages. However, we need a detailed design and a path to support all existing APIs. Without that, the community could be stuck with broken APIs forever. "The path to transactions seems more or less clear. We plan to support the idempotent produce from the beginning by storing producer state in the batch coordinator, which for classic topics is the responsibility of partition leaders. We plan to use the existing transaction coordinator mechanism together with the extension of this approach to support transactions." Great. Could you include that design in the sub KIP? Also, what about queues? JR3. " It's potentially faster and easier operationally, because the data are replicated using internal systems of the cloud provider without the Kafka producer-consumer protocol." The block storage cross-region replication is asynchronous, right? Stretch cluster on classic topics supports synchronous replication. JR4. There are some tradeoffs. This design has the benefit of reducing the number of connections to the brokers. However, it does mean that the throughput is limited by a single socket connection. We could follow up on this in the sub KIPs. Thanks, Jun On Fri, May 16, 2025 at 6:18 AM Ivan Yurchenko <i...@ivanyu.me> wrote: > > On Tue, May 13, 2025, at 19:34, Jun Rao wrote: > > > > JR4. "Balance traffic among brokers and eliminate broker hotspots with > > per-client granularity". Does that mean all traffic from a client is > served > > from a single broker? This seems to reduce the scalability from the > client > > perspective. > > We propose that all diskless traffic from a client is served by a single > broker at a point in time (i.e. it's not a strict link, it can be changed > on subsequent requests). By client, we mean “producer”, “consumer”, or > “admin client”, i.e. a logical entity, not a service instance. > > Provided that one single client rarely puts outstanding load on all or > many topics, with multiple clients in the cluster this will even out. > However, the mechanism of leaderless topics is universal enough to serve > any balancing policy. This will be managed by the broker side (as proposed > by KIP-1181), the error cost will be small and allow us to correct the > course if necessary. > > Best, > Ivan > > > > > > On Wed, Apr 16, 2025 at 5:00 AM Josep Prat <josep.p...@aiven.io.invalid> > > wrote: > > > > > Hi Kafka Devs! > > > > > > We want to start a new KIP discussion about introducing a new type of > > > topics that would make use of Object Storage as the primary source of > > > storage. However, as this KIP is big we decided to split it into > multiple > > > related KIPs. > > > We have the motivational KIP-1150 ( > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1150%3A+Diskless+Topics > > > ) > > > that aims to discuss if Apache Kafka should aim to have this type of > > > feature at all. This KIP doesn't go onto details on how to implement > it. > > > This follows the same approach used when we discussed KRaft. > > > > > > But as we know that it is sometimes really hard to discuss on that meta > > > level, we also created several sub-kips (linked in KIP-1150) that > offer an > > > implementation of this feature. > > > > > > We kindly ask you to use the proper DISCUSS threads for each type of > > > concern and keep this one to discuss whether Apache Kafka wants to have > > > this feature or not. > > > > > > Thanks in advance on behalf of all the authors of this KIP. > > > > > > ------------------ > > > Josep Prat > > > Open Source Engineering Director, Aiven > > > josep.p...@aiven.io | +491715557497 | aiven.io > > > Aiven Deutschland GmbH > > > Alexanderufer 3-7, 10117 Berlin > > > Geschäftsführer: Oskari Saarenmaa, Hannu Valtonen, > > > Anna Richardson, Kenneth Chen > > > Amtsgericht Charlottenburg, HRB 209739 B > > > > > >