[ 
https://issues.apache.org/jira/browse/KAFKA-1475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hang Qi updated KAFKA-1475:
---------------------------

    Description: 
We encounter an issue of consumers not consuming messages in production. ( this 
consumer has its own consumer group, and just consumes one topic of 3 
partitions.)

Based on the logs, we have following findings:
1. Zookeeper session expires, kafka highlevel consumer detected this event, and 
released old broker parition ownership and re-register consumer.
2. Upon creating ephemeral path in Zookeeper, it found that the path still 
exists, and try to read the content of the node.
3. After read back the content, it founds the content is same as that it is 
going to write, so it logged as "[ZkClient-EventThread-428-ZK/kafka] 
(kafka.utils.ZkUtils$) - 
/consumers/consumerA/ids/consumerA-1400815740329-5055aeee exists with value { 
"pattern":"static", "subscription":{ "TOPIC": 1}, "timestamp":"1400846114845", 
"version":1 } during connection loss; this is ok", and doing nothing.
4. After that, it throws exception indicated that the cause is 
"org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode 
for /consumers/consumerA/ids/consumerA-1400815740329-5055aeee" during 
rebalance. 
5. After all retries failed, it gave up retry and left the LeaderFinderThread, 
FetcherThread stopped. 

Step 3 looks very weird, checking the code, there is timestamp contains in the 
stored data, it may be caused by Zookeeper issue.

But what I am wondering is that whether it is possible to let application 
(kafka client users) to know that the underline LeaderFinderThread and 
FetcherThread are stopped, like allowing application to register some callback 
or throws some exception (by invalidate the KafkaStream iterator for example)? 
For me, it is not reasonable for the kafka client to shutdown everything and 
wait for next rebalance, and let application wait on iterator.hasNext() without 
knowing that there is something wrong underline.

I've read about twiki about kafka 0.9 consumer rewrite, and there is a 
ConsumerRebalanceCallback interface, but I am not sure how long it will take to 
be ready, and how long it will take for us to migrate. 

Please help to look at this issue.  Thanks very much!

  was:
We encounter an issue of consumers not consuming messages in production. ( this 
consumer has its own consumer group, and just consumes one topic, though 3 
partitions.)

Based on the logs, we have following findings:
1. Zookeeper session expires, kafka highlevel consumer detected this event, and 
released old broker parition ownership and re-register consumer.
2. Upon creating ephemeral path in Zookeeper, it found that the path still 
exists, and try to read the content of the node.
3. After read back the content, it founds the content is same as that it is 
going to write, so it logged as "[ZkClient-EventThread-428-ZK/kafka] 
(kafka.utils.ZkUtils$) - 
/consumers/consumerA/ids/consumerA-1400815740329-5055aeee exists with value { 
"pattern":"static", "subscription":{ "TOPIC": 1}, "timestamp":"1400846114845", 
"version":1 } during connection loss; this is ok", and doing nothing.
4. After that, it throws exception indicated that the cause is 
"org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode 
for /consumers/consumerA/ids/consumerA-1400815740329-5055aeee" during 
rebalance. 
5. After all retries failed, it gave up retry and left the LeaderFinderThread, 
FetcherThread stopped. 

Step 3 looks very weird, checking the code, there is timestamp contains in the 
stored data, it may be caused by Zookeeper issue.

But what I am wondering is that whether it is possible to let application 
(kafka client users) to know that the underline LeaderFinderThread and 
FetcherThread are stopped, like allowing application to register some callback 
or throws some exception (by invalidate the KafkaStream iterator for example)? 
For me, it is not reasonable for the kafka client to shutdown everything and 
wait for next rebalance, and let application wait on iterator.hasNext() without 
knowing that there is something wrong underline.

I've read about twiki about kafka 0.9 consumer rewrite, and there is a 
ConsumerRebalanceCallback interface, but I am not sure how long it will take to 
be ready, and how long it will take for us to migrate. 

Please help to look at this issue.  Thanks very much!


> Kafka consumer stops LeaderFinder/FetcherThreads, but application does not 
> know
> -------------------------------------------------------------------------------
>
>                 Key: KAFKA-1475
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1475
>             Project: Kafka
>          Issue Type: Bug
>          Components: clients, consumer
>    Affects Versions: 0.8.0
>         Environment: linux, rhel 6.4
>            Reporter: Hang Qi
>            Assignee: Neha Narkhede
>              Labels: consumer
>
> We encounter an issue of consumers not consuming messages in production. ( 
> this consumer has its own consumer group, and just consumes one topic of 3 
> partitions.)
> Based on the logs, we have following findings:
> 1. Zookeeper session expires, kafka highlevel consumer detected this event, 
> and released old broker parition ownership and re-register consumer.
> 2. Upon creating ephemeral path in Zookeeper, it found that the path still 
> exists, and try to read the content of the node.
> 3. After read back the content, it founds the content is same as that it is 
> going to write, so it logged as "[ZkClient-EventThread-428-ZK/kafka] 
> (kafka.utils.ZkUtils$) - 
> /consumers/consumerA/ids/consumerA-1400815740329-5055aeee exists with value { 
> "pattern":"static", "subscription":{ "TOPIC": 1}, 
> "timestamp":"1400846114845", "version":1 } during connection loss; this is 
> ok", and doing nothing.
> 4. After that, it throws exception indicated that the cause is 
> "org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = 
> NoNode for /consumers/consumerA/ids/consumerA-1400815740329-5055aeee" during 
> rebalance. 
> 5. After all retries failed, it gave up retry and left the 
> LeaderFinderThread, FetcherThread stopped. 
> Step 3 looks very weird, checking the code, there is timestamp contains in 
> the stored data, it may be caused by Zookeeper issue.
> But what I am wondering is that whether it is possible to let application 
> (kafka client users) to know that the underline LeaderFinderThread and 
> FetcherThread are stopped, like allowing application to register some 
> callback or throws some exception (by invalidate the KafkaStream iterator for 
> example)? For me, it is not reasonable for the kafka client to shutdown 
> everything and wait for next rebalance, and let application wait on 
> iterator.hasNext() without knowing that there is something wrong underline.
> I've read about twiki about kafka 0.9 consumer rewrite, and there is a 
> ConsumerRebalanceCallback interface, but I am not sure how long it will take 
> to be ready, and how long it will take for us to migrate. 
> Please help to look at this issue.  Thanks very much!



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to