Hi Ivan,

Thanks for the KIP!
This is a great improvement from the cost and latency perspective!

Some comments:
1. In the description of `partitioner.rack.aware` config, it'd be better to
make it clear that this setting has no effect if a custom partitioner is
used.

2. "Select the next partition from all partitions following the current
algorithm in the following cases:"
I think there should be one more case that "If the "partitioner.rack.aware"
is false;

3. "If the automatic partitioning is needed (i.e. no record partition or
key is specified):"
I think we should also add the case: "key is provided but
`partitioner.ignore.keys`
is enabled"

Thank you.
Luke


On Sat, Dec 21, 2024 at 2:32 AM Stanislav Kozlovski <
stanislavkozlov...@apache.org> wrote:

> Wow, I am super happy to see this KIP! Thanks for publishing it!
>
> I threw the idea out there last week in an article of mine about
> calculating Kafka costs[1]
>
> > [FUTURE KIP] - a Produce to Local Leader KIP, similar to KIP-392, can be
> introduced to eliminate producer inter-AZ network costs for topics that do
> not have keys.
> > there is no fundamental reason that a topic without ordering guarantees
> needs to produce to a specific partition - why not just choose the broker
> in the closest zone?
> > if all of your traffic is unkeyed, then this can further reduce Kafka’s
> network cost by 25%.
> > it sounds like a change that wouldn’t be too complicated, maybe even
> achievable today through the Producer’s partitioner.
>
> I don't know if you saw it from there, but I'm super happy to see it come
> to fruition! It's even easier than I thought - I didn't realize we had the
> node/rack information in the partitioner already.
>
> I think it will be very impactful.
> We've seen the strong trend in the industry of trading off latency for
> cost reduction. Namely - almost every vendor has introduced some sort of
> leaderless Kafka API model that outsources replication to a remote store
> cost[2][3][4][5]. This in turn allows them to reduce cross-zone networking
> costs to literally zero. In certain optimized deployments the networking
> cost can be up to 80-90% of the total cost![6] KIP-392 allows us to
> eliminate the consumer-side traffic cost, but there is great motivation to
> enable users to do the same for producers that don't depend on ordering.
>
> I am +1 the KIP as is.
>
> One may make an argument to have a way to enable it server-side via the
> broker, but I'd like to hear a good reason for that. I believe the
> simplicity in the current state is preferred, since clients already have
> freedom to produce to any partition they explicitly choose.
>
> Best,
> Stan
>
> [1]
> https://bigdata.2minutestreaming.com/p/the-brutal-truth-about-apache-kafka-cost-calculators
> [2] WarpStream and its $220m acquisition
> https://www.linkedin.com/pulse/how-confluent-acquired-warpstream-220m-after-just-13-months-hxgyf/
> [3] Confluent Freight
> https://www.confluent.io/blog/introducing-confluent-cloud-freight-clusters/
> [4] RedPanda Cloud Topics
> https://www.redpanda.com/blog/cloud-topics-streaming-data-object-storage
> [5] BufStream https://buf.build/product/bufstream
> [6] calculator https://akalculator.com/
>
> On 2024/12/20 11:35:28 Ivan Yurchenko wrote:
> > Hello all,
> >
> > I'd like to propose a new KIP to discuss: KIP-1123: Rack-aware
> partitioning for Kafka Producer [1].
> >
> > Best,
> > Ivan Yurchenko
> >
> > [1]
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1123%3A+Rack-aware+partitioning+for+Kafka+Producer
> >
>

Reply via email to