[ https://issues.apache.org/jira/browse/KAFKA-1475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14020148#comment-14020148 ]
Guozhang Wang commented on KAFKA-1475: -------------------------------------- In this case are the consumer using the same timestamp after session timeout to re-register? > Kafka consumer stops LeaderFinder/FetcherThreads, but application does not > know > ------------------------------------------------------------------------------- > > Key: KAFKA-1475 > URL: https://issues.apache.org/jira/browse/KAFKA-1475 > Project: Kafka > Issue Type: Bug > Components: clients, consumer > Affects Versions: 0.8.0 > Environment: linux, rhel 6.4 > Reporter: Hang Qi > Assignee: Neha Narkhede > Labels: consumer > Attachments: 5055aeee-zk.txt > > > We encounter an issue of consumers not consuming messages in production. ( > this consumer has its own consumer group, and just consumes one topic of 3 > partitions.) > Based on the logs, we have following findings: > 1. Zookeeper session expires, kafka highlevel consumer detected this event, > and released old broker parition ownership and re-register consumer. > 2. Upon creating ephemeral path in Zookeeper, it found that the path still > exists, and try to read the content of the node. > 3. After read back the content, it founds the content is same as that it is > going to write, so it logged as "[ZkClient-EventThread-428-ZK/kafka] > (kafka.utils.ZkUtils$) - > /consumers/consumerA/ids/consumerA-1400815740329-5055aeee exists with value { > "pattern":"static", "subscription":{ "TOPIC": 1}, > "timestamp":"1400846114845", "version":1 } during connection loss; this is > ok", and doing nothing. > 4. After that, it throws exception indicated that the cause is > "org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = > NoNode for /consumers/consumerA/ids/consumerA-1400815740329-5055aeee" during > rebalance. > 5. After all retries failed, it gave up retry and left the > LeaderFinderThread, FetcherThread stopped. > Step 3 looks very weird, checking the code, there is timestamp contains in > the stored data, it may be caused by Zookeeper issue. > But what I am wondering is that whether it is possible to let application > (kafka client users) to know that the underline LeaderFinderThread and > FetcherThread are stopped, like allowing application to register some > callback or throws some exception (by invalidate the KafkaStream iterator for > example)? For me, it is not reasonable for the kafka client to shutdown > everything and wait for next rebalance, and let application wait on > iterator.hasNext() without knowing that there is something wrong underline. > I've read about twiki about kafka 0.9 consumer rewrite, and there is a > ConsumerRebalanceCallback interface, but I am not sure how long it will take > to be ready, and how long it will take for us to migrate. > Please help to look at this issue. Thanks very much! -- This message was sent by Atlassian JIRA (v6.2#6252)