Re: [DISCUSS] KIP-1150 Diskless Topics

Jun Rao Fri, 16 May 2025 11:17:47 -0700

Hi, Giuseppe, Ivan,

Thanks for the reply.

JR1. Sounds good. It would be useful to document the benefit from each
major Cloud in the KIP.

JR2.
"The same tiered storage doesn’t support compacted topics and the
community seems fine with this, because compacted topics are not often that
big to benefit from remote storage."

Ideally, I would like tiered storage to support compacted topics. However,
at least for each topic that tiered storage is turned on, all existing
client APIs are supported. The original KIP has the detailed design on how
to support all existing APIs when tiered storage is turned on and the
feature is only GAed after all existing APIs are supported.

"Transactions and queues are great
features of Kafka, but many use cases–especially where diskless shine–may
be just fine without them, so it may be unnecessary to delay serving them."

This may or may not be true. The thing is that once something is in the
public APIs, it's hard to say that it's not used. So, the least
controversial path is to support all existing APIs. Our users are trusting
us to continue supporting existing APIs and protocols. We can't break that
trust.

I understand that it takes effort to support all existing APIs. It's fine
to deliver the APIs in stages. However, we need a detailed design and a
path to support all existing APIs. Without that, the community could be
stuck with broken APIs forever.

"The path to transactions seems more or less clear. We plan to support the
idempotent produce from the beginning by storing producer state in the
batch coordinator, which for classic topics is the responsibility of
partition leaders. We plan to use the existing transaction coordinator
mechanism together with the extension of this approach to support
transactions."

Great. Could you include that design in the sub KIP? Also, what about
queues?

JR3. " It's potentially faster and easier operationally, because the data
are replicated using internal systems of the cloud provider without the
Kafka producer-consumer protocol."
The block storage cross-region replication is asynchronous, right? Stretch
cluster on classic topics supports synchronous replication.

JR4. There are some tradeoffs. This design has the benefit of reducing the
number of connections to the brokers. However, it does mean that the
throughput is limited by a single socket connection. We could follow up on
this in the sub KIPs.

Thanks,

Jun

On Fri, May 16, 2025 at 6:18 AM Ivan Yurchenko <i...@ivanyu.me> wrote:

>
> On Tue, May 13, 2025, at 19:34, Jun Rao wrote:
> >
> > JR4. "Balance traffic among brokers and eliminate broker hotspots with
> > per-client granularity". Does that mean all traffic from a client is
> served
> > from a single broker? This seems to reduce the scalability from the
> client
> > perspective.
>
> We propose that all diskless traffic from a client is served by a single
> broker at a point in time (i.e. it's not a strict link, it can be changed
> on subsequent requests). By client, we mean “producer”, “consumer”, or
> “admin client”, i.e. a logical entity, not a service instance.
>
> Provided that one single client rarely puts outstanding load on all or
> many topics, with multiple clients in the cluster this will even out.
> However, the mechanism of leaderless topics is universal enough to serve
> any balancing policy. This will be managed by the broker side (as proposed
> by KIP-1181), the error cost will be small and allow us to correct the
> course if necessary.
>
> Best,
> Ivan
>
>
> >
> > On Wed, Apr 16, 2025 at 5:00 AM Josep Prat <josep.p...@aiven.io.invalid>
> > wrote:
> >
> > > Hi Kafka Devs!
> > >
> > > We want to start a new KIP discussion about introducing a new type of
> > > topics that would make use of Object Storage as the primary source of
> > > storage. However, as this KIP is big we decided to split it into
> multiple
> > > related KIPs.
> > > We have the motivational KIP-1150 (
> > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1150%3A+Diskless+Topics
> > > )
> > > that aims to discuss if Apache Kafka should aim to have this type of
> > > feature at all. This KIP doesn't go onto details on how to implement
> it.
> > > This follows the same approach used when we discussed KRaft.
> > >
> > > But as we know that it is sometimes really hard to discuss on that meta
> > > level, we also created several sub-kips (linked in KIP-1150) that
> offer an
> > > implementation of this feature.
> > >
> > > We kindly ask you to use the proper DISCUSS threads for each type of
> > > concern and keep this one to discuss whether Apache Kafka wants to have
> > > this feature or not.
> > >
> > > Thanks in advance on behalf of all the authors of this KIP.
> > >
> > > ------------------
> > > Josep Prat
> > > Open Source Engineering Director, Aiven
> > > josep.p...@aiven.io   |   +491715557497 | aiven.io
> > > Aiven Deutschland GmbH
> > > Alexanderufer 3-7, 10117 Berlin
> > > Geschäftsführer: Oskari Saarenmaa, Hannu Valtonen,
> > > Anna Richardson, Kenneth Chen
> > > Amtsgericht Charlottenburg, HRB 209739 B
> > >
> >
>

Re: [DISCUSS] KIP-1150 Diskless Topics

Reply via email to