Hello Hao,

First of all, THANK YOU for putting this together. I had been hoping
someone might bring something like this forward. A few comments:

**1: Runtime Complexity
> Klein’s cycle canceling algorithm can solve the min-cost flow problem in
O(E^2CU) time where C is max cost and U is max capacity. In our particular
case, C is 1 and U is at most 3 (A task can have at most 3 topics including
changelog topic?). So the algorithm runs in O(E^2) time for our case.

A Task can have multiple input topics, and also multiple state stores, and
multiple output topics. The most common case is three topics as you
described, but this is not necessarily guaranteed. Also, math is one of my
weak points, but to me O(E^2) is equivalent to O(1), and I struggle to see
how the algorithm isn't at least O(N) where N is the number of Tasks...?

**2: Broker-Side Partition Assignments
Consider the case with just three topics in a Task (one input, one output,
one changelog). If all three partition leaders are in the same Rack (or
better yet, the same broker), then we could get massive savings by
assigning the Task to that Rack/availability zone. But if the leaders for
each partition are spread across multiple zones, how will you handle that?
Is that outside the scope of this KIP, or is it worth introducing a
kafka-streams-generate-rebalance-proposal.sh tool?

Colt McNealy
*Founder, LittleHorse.io*


On Tue, May 9, 2023 at 4:03 PM Hao Li <h...@confluent.io.invalid> wrote:

> Hi all,
>
> I have submitted KIP-925 to add rack awareness logic in task assignment in
> Kafka Streams and would like to start a discussion:
>
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-925%3A+Rack+aware+task+assignment+in+Kafka+Streams
>
> --
> Thanks,
> Hao
>

Reply via email to