[ https://issues.apache.org/jira/browse/KAFKA-956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13696789#comment-13696789 ]
Sam Meder commented on KAFKA-956: --------------------------------- I guess we can carry my patch locally for now (we don't use more than one topic per consumer right now), but it doesn't seem great to have a corner case that can basically deadlock the consumer. I did look into whether it would be possible to scope rebalancing to just a subset of the topics for the consumer, but it looks like that would require quite a bit of detangling of the listener from the zookeeper consumer class. Not something I would put in 0.8 right now, but is that something your intern would tackle as part of the offset work? Seems worthwhile... > High-level consumer fails to check topic metadata response for errors > --------------------------------------------------------------------- > > Key: KAFKA-956 > URL: https://issues.apache.org/jira/browse/KAFKA-956 > Project: Kafka > Issue Type: Bug > Components: consumer > Affects Versions: 0.8 > Reporter: Sam Meder > Assignee: Neha Narkhede > Priority: Blocker > Fix For: 0.8 > > Attachments: consumer_metadata_fetch.patch > > > In our environment we noticed that consumers would sometimes hang when > started too close to starting the Kafka server. I tracked this down and it > seems to be related to some code in rebalance > (ZookeeperConsumerConnector.scala). In particular the following code seems > problematic: > val topicsMetadata = > ClientUtils.fetchTopicMetadata(myTopicThreadIdsMap.keySet, > brokers, > config.clientId, > > config.socketTimeoutMs, > > correlationId.getAndIncrement).topicsMetadata > val partitionsPerTopicMap = new mutable.HashMap[String, Seq[Int]] > topicsMetadata.foreach(m => { > val topic = m.topic > val partitions = m.partitionsMetadata.map(m1 => m1.partitionId) > partitionsPerTopicMap.put(topic, partitions) > }) > The response is never checked for error, so may not actually contain any > partition info! Rebalance goes its merry way, but doesn't know about any > partitions so never assigns them... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira