Thanks for the feedback Almog. I agree that the level of effort for this 
requires several different KIPs that are all related. 

For the first phase, I envision a proxy layer that sits in front of multiple 
Kafka clusters, e.g. one traditional deployment, and another diskless 
implementation. Then based on the requested QoS by the client, the proxy will 
route the client to the best cluster for that task. As part of this first 
phase, cluster expansion (if possible) would be in scope as well.  Thus, if the 
proxy determines that all of the clusters are overloaded, it can choose to 
expand an existing one by adding more brokers, or create a net new cluster 
dynamically to accommodate the anticipated load.

Phase 2 would focus on tracking the cluster and topic performance against the 
stated QoS performance metrics. Likely starting with alerts based on 
compliance, non-compliance of the agreed upon SLAs. prolonged violation of the 
SLA would trigger consumer/producer negotiation.

In a later phase we can focus on the negotiation between producer and 
consumers. This would most likely require dynamic reassignment of topics to 
clusters, e.g. shifting a topic from a diskless cluster to a disk-based one to 
accommodate a lower latency requirement by a consumer.

On 2025/05/13 15:26:01 Almog Gavra wrote:
> Thanks for the KIP Peter! Curious to see where this one goes, I think it's
> good to start a discussion around this though perhaps we'll need to split
> it up into more focused improvements as there's a lot bundled in this one
> idea!
> 
> A0. I'd like to see some folk that are more familiar with the broker
> implementation to chime in around the feasibility of implementing some of
> this. AFAIK, there's no capabilities that allow (for example) shifting
> resources between topics. Isolating that from a resource allocation
> perspective may be a huge lift, though certainly a valuable one.
> 
> A1. With A0 in mind, I'm wondering what the benefit for making the QoS spec
> an open standard - it depends heavily both on the broker implementation and
> on how it's deployed (containerized? bare metal? k8s?). That makes what we
> can practically offer bundled with the default implementation limited.
> OTOH, I'm not sure whether users benefit from "open standards, free of
> vendor bias as much as possible" If the specification is customizable
> enough to allow for vendor specific extensions.
> 
> A2. More a technical note, but the dynamic negotiation between producer and
> consumer seems to break a key abstraction of Kafka which is decoupling
> producers from consumers. That might work well if you have one consumer,
> but if you have multiple I imagine you wouldn't want one lagging to cause
> the producer to back up.
> 
> I'll be following along, I'm sure there will be some good discussions
> around this!
> 
> - Almog
> 
> On Mon, May 12, 2025 at 4:47 PM Peter Corless
> <peter.corl...@startree.ai.invalid> wrote:
> 
> > David Kjerrumgaard and I wrote up the following KIP for Kafka Quality of
> > Service (QoS). It would be a mechanism to describe desired behaviors and
> > actual capabilities of producers, clusters and consumers, and to allow them
> > to negotiate desired throughputs, latencies, data retention, and other
> > elements of data streaming. It would also provide instrumentality for
> > observability to measure actual performance to compare to desired
> > performance.
> >
> > Would love to hear frank and thoughtful feedback, as well as committers who
> > would be interested in working on implementation.
> >
> >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1182%3A+Quality+of+Service+%28QoS%29+Framework
> >
> > --
> >
> > [image: StarTree] <https://startree.ai>
> > Peter Corless
> > Director of Product Marketing
> > 650-906-3134
> > Follow us: [image: LinkedIn] <https://www.linkedin.com/in/petercorless/
> > >[image:
> > Twitter] <https://twitter.com/petercorless>[image: Slack]
> > <https://stree.ai/slack>[image: YouTube]
> > <https://youtube.com/StarTreeData>[image:
> > Calendly] <https://calendly.com/peter-corless/30min>
> >
> > [image: Save my spot for Real-Time Analytics Summit 2025]
> > <
> > https://rtasummit.startree.ai/?utm_source=referral&utm_medium=email&utm_campaign=signature
> > >
> >
> 

Reply via email to