[ https://issues.apache.org/jira/browse/KAFKA-1585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14092050#comment-14092050 ]
Joe Stein commented on KAFKA-1585: ---------------------------------- FWIW there were a lot of bug fixes released in the Zookeeper 3.4.6 http://zookeeper.apache.org/doc/r3.4.6/releasenotes.html from 3.4.5 version. You could be hitting ZOOKEEPER-1382 which was fixed in the 3.4.6 release Current Kafka 0.8.1.1 zookeeper recommend https://kafka.apache.org/documentation.html#zk though folks are using 3.4.6 in production and that should be Zookeeper version for 0.8.2 In regards to your logs, before this happened it looks like you had errors and then a reconnect and consumer shutdown Line 132356: 18:31:38,948 [7-cloudera:2181] INFO kafka.utils.Logging$class - [Q_dev-1407608193903-1cb30b18], Q_dev-1407608193903-1cb30b18-0 attempting to claim partition 0 Line 132357: 18:31:38,975 [26-d7f0e66a-0-0] ERROR kafka.utils.Logging$class - [ConsumerFetcherThread-Q_dev-1407608195226-d7f0e66a-0-0], Current offset 15 for partition [gk.q.event,0] out of range; reset offset to 0 Line 132358: 18:31:38,980 [62-1d81f64b-0-0] ERROR kafka.utils.Logging$class - [ConsumerFetcherThread-Q_dev-1407608193962-1d81f64b-0-0], Current offset 4 for partition [gk.q.mail.api,0] out of range; reset offset to 0 Line 132359: 18:31:38,994 [84-ceea5788-0-0] WARN kafka.utils.Logging$class - Reconnect due to socket error: null Line 132360: 18:31:38,995 [84-ceea5788-0-0] INFO kafka.utils.Logging$class - [ConsumerFetcherThread-dev_dev-1407608194884-ceea5788-0-0], Stopped Line 132361: 18:31:38,995 [atcher_executor] INFO kafka.utils.Logging$class - [ConsumerFetcherThread-dev_dev-1407608194884-ceea5788-0-0], Shutdown completed Line 132362: 18:31:38,995 [atcher_executor] INFO kafka.utils.Logging$class - [ConsumerFetcherManager-1407608194890] All connections stopped Line 132363: 18:31:38,996 [atcher_executor] INFO kafka.utils.Logging$class - [dev_dev-1407608194884-ceea5788], Cleared all relevant queues for this fetcher Line 132364: 18:31:38,996 [atcher_executor] INFO kafka.utils.Logging$class - [dev_dev-1407608194884-ceea5788], Cleared the data chunks in all the consumer message iterators Line 132365: 18:31:38,996 [atcher_executor] INFO kafka.utils.Logging$class - [dev_dev-1407608194884-ceea5788], Committing all offsets after clearing the fetcher queues Line 132366: 18:31:38,996 [atcher_executor] INFO kafka.utils.Logging$class - [dev_dev-1407608194884-ceea5788], Releasing partition ownership Line 132367: 18:31:39,005 [7-cloudera:2181] INFO kafka.utils.Logging$class - conflict in /consumers/Q/owners/gk.q.log/0 data: Q_dev-1407608193903-1cb30b18-0 stored data: Q_dev-1407608205503-9cfb99aa-0 likely what happened is when it reconnected the timeout with zk never occurred and it got stuck there. Could be the Zk bug, could also be related somewhat to KAFKA-1387 or KAFKA-1451 I will link the JIRAs so when we test 0.8.2 see about reproducing this on a good zk version To resolve that you can stop the consumer, wait for the zk nodes to expire and start up the consumers again. > Client: Infinite "conflict in /consumers/" > ------------------------------------------ > > Key: KAFKA-1585 > URL: https://issues.apache.org/jira/browse/KAFKA-1585 > Project: Kafka > Issue Type: Bug > Components: consumer > Affects Versions: 0.8.1.1 > Reporter: Artur Denysenko > Priority: Critical > Fix For: 0.8.2 > > Attachments: kafka_consumer_ephemeral_node_extract.zip > > > Periodically we have kafka consumers cycling in "conflict in /consumers/" and > "I wrote this conflicted ephemeral node". > Please see attached log extract. > After restarting the process kafka consumers are working perfectly. > We are using Zookeeper 3.4.5 -- This message was sent by Atlassian JIRA (v6.2#6252)