Re: [DISCUSS] KIP-925: rack aware task assignment in Kafka Streams

Hao Li Tue, 23 May 2023 11:46:22 -0700

Thanks John! Yeah. The ConvergenceTest looks very helpful. I will add it to
the test plan. I will also add tests to verify the new optimizer will
produce a balanced assignment which has no worse cross AZ cost than the
existing assignor.


Hao

On Mon, May 22, 2023 at 3:39 PM John Roesler <vvcep...@apache.org> wrote:

> Hi Hao,
>
> Thanks for the KIP!
>
> Overall, I think this is a great idea. I always wanted to circle back
> after the Smooth Scaling KIP to put a proper optimization algorithm into
> place. I think this has the promise to really improve the quality of the
> balanced assignments we produce.
>
> Thanks for providing the details about the MaxCut/MinFlow algorithm. It
> seems like a good choice for me, assuming we choose the right scaling
> factors for the weights we add to the graph. Unfortunately, I don't think
> that there's a good way to see how easy or hard this is going to be until
> we actually implement it and test it.
>
> That leads to the only real piece of feedback I had on the KIP, which is
> the testing portion. You mentioned system/integration/unit tests, but
> there's not too much information about what those tests will do. I'd like
> to suggest that we invest in more simulation testing specifically, similar
> to what we did in
> https://github.com/apache/kafka/blob/trunk/streams/src/test/java/org/apache/kafka/streams/processor/internals/assignment/TaskAssignorConvergenceTest.java
> .
>
> In fact, it seems like we _could_ write the simulation up front, and then
> implement the algorithm in a dummy way and just see whether it passes the
> simulations or not, before actually integrating it with Kafka Streams.
>
> Basically, I'd be +1 on this KIP today, but I'd feel confident about it if
> we had a little more detail regarding how we are going to verify that the
> new optimizer is actually going to produce more optimal plans than the
> existing assigner we have today.
>
> Thanks again!
> -John
>
> On 2023/05/22 16:49:22 Hao Li wrote:
> > Hi Colt,
> >
> > Thanks for the comments.
> >
> > > and I struggle to see how the algorithm isn't at least O(N) where N is
> > the number of Tasks...?
> >
> > For O(E^2 * (CU)) complexity, C and U can be viewed as constant. Number
> of
> > edges E is T * N where T is the number of clients and N is the number of
> > Tasks. This is because a task can be assigned to any client so there will
> > be an edge between every task and every client. The total complexity
> would
> > be O(T * N) if we want to be more specific.
> >
> > > But if the leaders for each partition are spread across multiple zones,
> > how will you handle that?
> >
> > This is what the min-cost flow solution is trying to solve? i.e. Find an
> > assignment of tasks to clients where across AZ traffic can be minimized.
> > But there are some constraints to the solution and one of them is we need
> > to balance task assignment first (
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-925%3A+Rack+aware+task+assignment+in+Kafka+Streams#KIP925:RackawaretaskassignmentinKafkaStreams-Designforrackawareassignment
> ).
> > So in your example of three tasks' partitions being in the same AZ of a
> > client, if there are other clients, we still want to balance the tasks to
> > other clients even if putting all tasks to a single client can result in
> 0
> > cross AZ traffic. In
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-925%3A+Rack+aware+task+assignment+in+Kafka+Streams#KIP925:RackawaretaskassignmentinKafkaStreams-Algorithm
> > section, the algorithm will try to find a min-cost solution based on
> > balanced assignment instead of pure min-cost.
> >
> > Thanks,
> > Hao
> >
> > On Tue, May 9, 2023 at 5:55 PM Colt McNealy <c...@littlehorse.io> wrote:
> >
> > > Hello Hao,
> > >
> > > First of all, THANK YOU for putting this together. I had been hoping
> > > someone might bring something like this forward. A few comments:
> > >
> > > **1: Runtime Complexity
> > > > Klein’s cycle canceling algorithm can solve the min-cost flow
> problem in
> > > O(E^2CU) time where C is max cost and U is max capacity. In our
> particular
> > > case, C is 1 and U is at most 3 (A task can have at most 3 topics
> including
> > > changelog topic?). So the algorithm runs in O(E^2) time for our case.
> > >
> > > A Task can have multiple input topics, and also multiple state stores,
> and
> > > multiple output topics. The most common case is three topics as you
> > > described, but this is not necessarily guaranteed. Also, math is one
> of my
> > > weak points, but to me O(E^2) is equivalent to O(1), and I struggle to
> see
> > > how the algorithm isn't at least O(N) where N is the number of
> Tasks...?
> > >
> > > **2: Broker-Side Partition Assignments
> > > Consider the case with just three topics in a Task (one input, one
> output,
> > > one changelog). If all three partition leaders are in the same Rack (or
> > > better yet, the same broker), then we could get massive savings by
> > > assigning the Task to that Rack/availability zone. But if the leaders
> for
> > > each partition are spread across multiple zones, how will you handle
> that?
> > > Is that outside the scope of this KIP, or is it worth introducing a
> > > kafka-streams-generate-rebalance-proposal.sh tool?
> > >
> > > Colt McNealy
> > > *Founder, LittleHorse.io*
> > >
> > >
> > > On Tue, May 9, 2023 at 4:03 PM Hao Li <h...@confluent.io.invalid>
> wrote:
> > >
> > > > Hi all,
> > > >
> > > > I have submitted KIP-925 to add rack awareness logic in task
> assignment
> > > in
> > > > Kafka Streams and would like to start a discussion:
> > > >
> > > >
> > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-925%3A+Rack+aware+task+assignment+in+Kafka+Streams
> > > >
> > > > --
> > > > Thanks,
> > > > Hao
> > > >
> > >
> >
> >
> > --
> > Thanks,
> > Hao
> >
>


-- 
Thanks,
Hao

Re: [DISCUSS] KIP-925: rack aware task assignment in Kafka Streams

Reply via email to