Re: Kafka Streams - partition assignment for the input topic

Sophie Blee-Goldman Fri, 20 Mar 2020 10:33:38 -0700

Although it's not the main objective, one side effect of KIP-441 should be
improved balance of the final stable assignment. By warming up standbys
before switching them over to active tasks we can achieve stickiness without
sacrificing balance in the followup rebalance.


This work is targeted for the next release, so if you do still observe
issues in
newer versions I'd recommend trying out 2.6 when it comes out.

You can read up on the details and track the progress of this KIP in the
KIP document:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-441:+Smooth+Scaling+Out+for+Kafka+Streams
JIRA: https://issues.apache.org/jira/browse/KAFKA-6145?src=confmacro

Cheers,
Sophie

On Fri, Mar 20, 2020 at 10:20 AM Matthias J. Sax <mj...@apache.org> wrote:

> Partition assignment, or move specific "task placement" for Kafka
> Streams, is a hard-coded algorithm (cf.
>
> https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/processor/internals/assignment/StickyTaskAssignor.java
> ).
> The algorithm actually tires to assign different tasks from the same
> sub-topology to different instances and thus, your 6 input topic
> partitions should ideally get balanced over your 3 instance (ie, 2 each,
> one for each thread).
>
> However, the algorithm needs to trade-off load balancing and stickiness
> (to avoid unnecessary, expensive state migration) and thus, the
> placement strategy is best effort only. Also, in older versions there
> was some issue that got fixed in newer version (ie, 2.0.x and newer).
> Not sure what version you are on (as you linked to 1.0 docs, maybe
> upgrade resolves your issue?).
>
> Compare:
>
>  - https://issues.apache.org/jira/browse/KAFKA-6039
>  - https://issues.apache.org/jira/browse/KAFKA-7144
>
> If you still observe issues in never version, please comment on the
> tickets ofr create a new ticket describing the problem. Or even better,
> do a PR to help improving the "task placement" algorithm. :)
>
>
> -Matthias
>
>
> On 3/20/20 6:47 AM, Stephen Young wrote:
> > Thanks Guozhang. That's really helpful!
> >
> > Are you able to explain a bit more about how it would work for my use
> case? As I understand it this 'repartition' method enables us to
> materialize a stream to a new topic with a custom partitioning strategy.
> >
> > But my problem is not how the topic is partitioned. My issue is that the
> partitions of the source topic need to be spread equally amongst all the
> available threads. How could 'repartition' help with this?
> >
> > Stephen
> >
> > On 2020/03/19 23:20:54, Guozhang Wang <wangg...@gmail.com> wrote:
> >> Hi Stephen,
> >>
> >> We've deprecated the partition-grouper API due to its drawbacks in
> >> upgrading compatibility (consider if you want to change the
> num.partitions
> >> while evolving your application), and instead we're working on KIP-221
> for
> >> the same purpose of your use case:
> >>
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+DSL+with+Connecting+Topic+Creation+and+Repartition+Hint
> >>
> >>
> >> Guozhang
> >>
> >> On Wed, Mar 18, 2020 at 7:48 AM Stephen Young
> >> <wintersg...@googlemail.com.invalid> wrote:
> >>
> >>> I have a question about partition assignment for a kafka streams app.
> As I
> >>> understand it the more complex your topology is the greater the number
> of
> >>> internal topics kafka streams will create. In my case the app has 8
> graphs
> >>> in the topology. There are 6 partitions for each graph (this matches
> the
> >>> number of partitions of the input topic). So there are 48 partitions
> that
> >>> the app needs to handle. These get balanced equally across all 3
> servers
> >>> where the app is running (each server also has 2 threads so there are 6
> >>> available instances of the app).
> >>>
> >>> The problem for me is that the partitions of the input topic have the
> >>> heaviest workload. But these 6 partitions are not distributed evenly
> >>> amongst the instances. They are just considered 6 partitions amongst
> the 48
> >>> the app needs to balance. But this means if a server gets most or all
> of
> >>> these 6 partitions, it ends up exhausting all of the resources on that
> >>> server.
> >>>
> >>> Is there a way of equally balancing these 6 specific partitions
> amongst the
> >>> available instances? I thought writing a custom partition grouper might
> >>> help here:
> >>>
> >>>
> >>>
> https://kafka.apache.org/10/documentation/streams/developer-guide/config-streams.html#partition-grouper
> >>>
> >>> But the advice seems to be to not do this otherwise you risk breaking
> the
> >>> app.
> >>>
> >>> Thanks!
> >>>
> >>
> >>
> >> --
> >> -- Guozhang
> >>
>
>

Re: Kafka Streams - partition assignment for the input topic

Reply via email to