[ https://issues.apache.org/jira/browse/KAFKA-3559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15581874#comment-15581874 ]
ASF GitHub Bot commented on KAFKA-3559: --------------------------------------- GitHub user enothereska opened a pull request: https://github.com/apache/kafka/pull/2032 KAFKA-3559: Recycle old tasks when possible You can merge this pull request into a Git repository by running: $ git pull https://github.com/enothereska/kafka KAFKA-3559-onPartitionAssigned Alternatively you can review and apply these changes as the patch at: https://github.com/apache/kafka/pull/2032.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2032 ---- commit 28a50430e4136ba75f0d9b957a67f22e7b1e86a0 Author: Eno Thereska <eno.there...@gmail.com> Date: 2016-10-17T10:46:45Z Recycle old tasks when possible ---- > Task creation time taking too long in rebalance callback > -------------------------------------------------------- > > Key: KAFKA-3559 > URL: https://issues.apache.org/jira/browse/KAFKA-3559 > Project: Kafka > Issue Type: Improvement > Components: streams > Reporter: Guozhang Wang > Assignee: Eno Thereska > Labels: architecture > Fix For: 0.10.2.0 > > > Currently in Kafka Streams, we create stream tasks upon getting newly > assigned partitions in rebalance callback function {code} onPartitionAssigned > {code}, which involves initialization of the processor state stores as well > (including opening the rocksDB, restore the store from changelog, etc, which > takes time). > With a large number of state stores, the initialization time itself could > take tens of seconds, which usually is larger than the consumer session > timeout. As a result, when the callback is completed, the consumer is already > treated as failed by the coordinator and rebalance again. > We need to consider if we can optimize the initialization process, or move it > out of the callback function, and while initializing the stores one-by-one, > use poll call to send heartbeats to avoid being kicked out by coordinator. -- This message was sent by Atlassian JIRA (v6.3.4#6332)