Hi Colt,

Thanks for the comments.

> and I struggle to see how the algorithm isn't at least O(N) where N is
the number of Tasks...?

For O(E^2 * (CU)) complexity, C and U can be viewed as constant. Number of
edges E is T * N where T is the number of clients and N is the number of
Tasks. This is because a task can be assigned to any client so there will
be an edge between every task and every client. The total complexity would
be O(T * N) if we want to be more specific.

> But if the leaders for each partition are spread across multiple zones,
how will you handle that?

This is what the min-cost flow solution is trying to solve? i.e. Find an
assignment of tasks to clients where across AZ traffic can be minimized.
But there are some constraints to the solution and one of them is we need
to balance task assignment first (
https://cwiki.apache.org/confluence/display/KAFKA/KIP-925%3A+Rack+aware+task+assignment+in+Kafka+Streams#KIP925:RackawaretaskassignmentinKafkaStreams-Designforrackawareassignment).
So in your example of three tasks' partitions being in the same AZ of a
client, if there are other clients, we still want to balance the tasks to
other clients even if putting all tasks to a single client can result in 0
cross AZ traffic. In
https://cwiki.apache.org/confluence/display/KAFKA/KIP-925%3A+Rack+aware+task+assignment+in+Kafka+Streams#KIP925:RackawaretaskassignmentinKafkaStreams-Algorithm
section, the algorithm will try to find a min-cost solution based on
balanced assignment instead of pure min-cost.

Thanks,
Hao

On Tue, May 9, 2023 at 5:55 PM Colt McNealy <c...@littlehorse.io> wrote:

> Hello Hao,
>
> First of all, THANK YOU for putting this together. I had been hoping
> someone might bring something like this forward. A few comments:
>
> **1: Runtime Complexity
> > Klein’s cycle canceling algorithm can solve the min-cost flow problem in
> O(E^2CU) time where C is max cost and U is max capacity. In our particular
> case, C is 1 and U is at most 3 (A task can have at most 3 topics including
> changelog topic?). So the algorithm runs in O(E^2) time for our case.
>
> A Task can have multiple input topics, and also multiple state stores, and
> multiple output topics. The most common case is three topics as you
> described, but this is not necessarily guaranteed. Also, math is one of my
> weak points, but to me O(E^2) is equivalent to O(1), and I struggle to see
> how the algorithm isn't at least O(N) where N is the number of Tasks...?
>
> **2: Broker-Side Partition Assignments
> Consider the case with just three topics in a Task (one input, one output,
> one changelog). If all three partition leaders are in the same Rack (or
> better yet, the same broker), then we could get massive savings by
> assigning the Task to that Rack/availability zone. But if the leaders for
> each partition are spread across multiple zones, how will you handle that?
> Is that outside the scope of this KIP, or is it worth introducing a
> kafka-streams-generate-rebalance-proposal.sh tool?
>
> Colt McNealy
> *Founder, LittleHorse.io*
>
>
> On Tue, May 9, 2023 at 4:03 PM Hao Li <h...@confluent.io.invalid> wrote:
>
> > Hi all,
> >
> > I have submitted KIP-925 to add rack awareness logic in task assignment
> in
> > Kafka Streams and would like to start a discussion:
> >
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-925%3A+Rack+aware+task+assignment+in+Kafka+Streams
> >
> > --
> > Thanks,
> > Hao
> >
>


-- 
Thanks,
Hao

Reply via email to