[ https://issues.apache.org/jira/browse/KAFKA-5406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16042077#comment-16042077 ]
huxihx commented on KAFKA-5406: ------------------------------- Maybe could estimate the total time period for the network recovery and make sure `rebalance.max.retries` * `rebalance.backoff.ms` is no less than the period. Perhaps some application-level logic is required to handle a long network outage. > NoNodeException result in rebalance failed > ------------------------------------------ > > Key: KAFKA-5406 > URL: https://issues.apache.org/jira/browse/KAFKA-5406 > Project: Kafka > Issue Type: Bug > Components: consumer > Affects Versions: 0.8.2.2, 0.10.0.0 > Environment: windows8.1 centos6.4 > Reporter: xiaoguy > Priority: Critical > Labels: easyfix, patch > Attachments: log.log > > > hey guys , I got this problem this days, > because of the network is unstableļ¼ consumer rebalance failed after 5 times > ,the log shows that zk path /consumers/$(groupIdName)/ids/ is empty, > consumer seems can't register after network recovered, so i got the kafka > source code (0.8.2.2) and found the > consumer/ZookeeperConsumerConnector$ZKSessionExpireListener handleNewSession > won't call , and handleStateChanged do nothing, > so i change the code like this ,and it seems works , and i checked 0.10.0.0 > version, the same problem, is this a bug ? i'm confused , thank you. > def handleStateChanged(state: KeeperState) { > // do nothing, since zkclient will do reconnect for us. > if(state==KeeperState.SyncConnected){ > handleNewSession() > } > System.err.println("----------------ZKSessionExpireListener------------ > handleStateChanged-----state:"+state+"----"+state.getIntValue) > } -- This message was sent by Atlassian JIRA (v6.3.15#6346)