[ 
https://issues.apache.org/jira/browse/KAFKA-13555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17949627#comment-17949627
 ] 

Lorcan commented on KAFKA-13555:
--------------------------------

Hi [~mjsax], I have made some progress with this ticket and would like to get 
some advice regarding how to progress, rather than simply raising a PR which 
might not be in a ready state.

I've got a branch that has made changes to the StickyTaskAssignor under the 
streams.processor directory.

Would this ticket require a change to all of the assignors in the codebase? 
Would it be an issue if there were a discrepancy in load calculation between 
the StickyTaskAssignor and the HighAvailabilityTaskAssignor for example?

There are also aspects of this which I am not entriely sure of (for example 
determining in the code what an input partition is, as opposed to other types 
of partitions) and have made a best guess.

My branch passes the existing unit tests and I've written a basic test to check 
that an assignment is being made by partition count rather than task count.

I'm happy to give a high level overview of all the changes I've made if that 
would be helpful, either here or on Github.

> Consider number if input topic partitions for task assignment
> -------------------------------------------------------------
>
>                 Key: KAFKA-13555
>                 URL: https://issues.apache.org/jira/browse/KAFKA-13555
>             Project: Kafka
>          Issue Type: Improvement
>          Components: streams
>            Reporter: Matthias J. Sax
>            Assignee: Lorcan
>            Priority: Major
>
> StreamsAssignor tries to distribute tasks evenly across all instances/threads 
> of a Kafka Streams application. It knows about instances/thread (to give more 
> capacity to instances with more thread), and it distinguishes between 
> stateless and stateful tasks. We also try to not move state around but to use 
> a sticky assignment if possible. However, the assignment does not take the 
> number of input topic partitions into account.
> For example, an upstream tasks could compute two joins, and thus has 3 input 
> partitions, while a downstream task compute a follow up aggregation with a 
> single input partitions (from the repartition topic). It could happen that 
> one thread gets the 3 input partition tasks assigned, while the other thread 
> get the single input partition tasks assigned resulting to an uneven 
> partition assignment across both threads.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to