>From your logs the channel with the brokers are broken, are the brokers
alive at that time?

Guozhang


On Fri, Jan 10, 2014 at 10:52 AM, Withers, Robert
<robert.with...@dish.com>wrote:

> The core problem is our consumers stop consuming and lag increases.  We
> found this blog:
> https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-Myconsumerseemstohavestopped,why?.
>  This lists 3 possibilities.
>
> The blog also talks earlier about spurious rebalances, due to improper GC
> settings, but we couldn't find what GC settings to use.  We are considering
> changing the zookeeper timeouts.  We are a little confused about the
> various issues, the sequence of issues and what could cause the consumers
> to stop reading.  If the fetchers get shutdown, due to a
> ClosedByInterruptException in the "leader_finder" thread, which tells the
> "executor_watcher" thread to shutdown the fetchers, that would be another
> reason the consumers stop processing data.  Is this possible?
>
> Thank you,
> rob
>
> -----Original Message-----
> From: Seshadri, Balaji [mailto:balaji.sesha...@dish.com]
> Sent: Friday, January 10, 2014 11:40 AM
> To: users@kafka.apache.org
> Subject: RE: Looks like consumer fetchers get stopped we are not getting
> any data
>
> It would be helpful if you guys can shed some light why all fetchers are
> getting stopped.
>
> -----Original Message-----
> From: Seshadri, Balaji [mailto:balaji.sesha...@dish.com]
> Sent: Friday, January 10, 2014 11:28 AM
> To: users@kafka.apache.org
> Subject: RE: Looks like consumer fetchers get stopped we are not getting
> any data
>
> We also got the below error when this happens.
>
> {2014-01-10 00:58:11,292} INFO
>  
> [account-info-updated-hadoop-consumer_tm1mwdpl04-1389222553159-ad59660b_watcher_executor]
> (?:?) -
> [account-info-updated-hadoop-consumer_tm1mwdpl04-1389222553159-ad59660b],
> exception during rebalance
> org.I0Itec.zkclient.exception.ZkNoNodeException:
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode =
> NoNode for
> /consumers/account-info-updated-hadoop-consumer/ids/account-info-updated-hadoop-consumer_tm1mwdpl04-1389222553159-ad59660b
>         at
> org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:47)
>         at
> org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:685)
>         at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:766)
>         at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:761)
>         at kafka.utils.ZkUtils$.readData(Unknown Source)
>         at kafka.consumer.TopicCount$.constructTopicCount(Unknown Source)
>         at
> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.kafka$consumer$ZookeeperConsumerConnector$ZKRebalancerListener$$rebalance(Unknown
> Source)
>         at
> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$syncedRebalance$1.apply$mcVI$sp(Unknown
> Source)
>         at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:142)
>         at
> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.syncedRebalance(Unknown
> Source)
>         at
> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anon$1.run(Unknown
> Source) Caused by: org.apache.zookeeper.KeeperException$NoNodeException:
> KeeperErrorCode = NoNode for
> /consumers/account-info-updated-hadoop-consumer/ids/account-info-updated-hadoop-consumer_tm1mwdpl04-1389222553159-ad59660b
>         at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
>         at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
>         at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:927)
>         at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:956)
>         at org.I0Itec.zkclient.ZkConnection.readData(ZkConnection.java:103)
>         at org.I0Itec.zkclient.ZkClient$9.call(ZkClient.java:770)
>         at org.I0Itec.zkclient.ZkClient$9.call(ZkClient.java:766)
>         at
> org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675)
>         ... 9 more
>
> -----Original Message-----
> From: Seshadri, Balaji [mailto:balaji.sesha...@dish.com]
> Sent: Friday, January 10, 2014 10:52 AM
> To: users@kafka.apache.org
> Subject: Looks like consumer fetchers get stopped we are not getting any
> data
>
> Please let us know why we are not getting any data from Kafaka after this
> log from Kafka,can you guys lets us know.
>
> What could be causing all fetchers associated to be stooped why it is not
> doing retry.
>
> {2014-01-10 00:58:09,284} WARN
>  
> [account-info-updated-hadoop-consumer_tm1mwdpl04-1389222553159-ad59660b-leader-finder-thread]
> (?:?) - Fetching topic metadata with correlation id 3 for topics
> [Set(account-info-updated)] from broker
> [id:1,host:tm1-kafkabroker101,port:9092] failed
> java.nio.channels.ClosedByInterruptException
>         at
> java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
>         at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:506)
>         at java.nio.channels.SocketChannel.write(SocketChannel.java:493)
>         at kafka.network.BoundedByteBufferSend.writeTo(Unknown Source)
>         at kafka.network.Send$class.writeCompletely(Unknown Source)
>         at kafka.network.BoundedByteBufferSend.writeCompletely(Unknown
> Source)
>         at kafka.network.BlockingChannel.send(Unknown Source)
>         at kafka.producer.SyncProducer.liftedTree1$1(Unknown Source)
>         at
> kafka.producer.SyncProducer.kafka$producer$SyncProducer$$doSend(Unknown
> Source)
>         at kafka.producer.SyncProducer.send(Unknown Source)
>         at kafka.client.ClientUtils$.fetchTopicMetadata(Unknown Source)
>         at kafka.client.ClientUtils$.fetchTopicMetadata(Unknown Source)
>         at
> kafka.consumer.ConsumerFetcherManager$LeaderFinderThread.doWork(Unknown
> Source)
>         at kafka.utils.ShutdownableThread.run(Unknown Source)
> {2014-01-10 00:58:09,284} DEBUG
> [account-access-hadoop-consumer_tm1mwdpl04-1389222551916-a0c87abc_watcher_executor]
> (?:?) - initial fetch offset of account-access:27: fetched offset = 9655:
> consumed offset = 9655 is 9655
> {2014-01-10 00:58:09,284} DEBUG
> [bill-generated-hadoop-consumer_tm1mwdpl04-1389222547995-29a6dce9_watcher_executor]
> (?:?) - initial consumer offset of bill-generated:11: fetched offset = 152:
> consumed offset = 152 is 152
> {2014-01-10 00:58:09,284} DEBUG
> [outbound-communications-hadoop-consumer_tm1mwdpl04-1389222550693-8bc34b77_watcher_executor]
> (?:?) -
> [outbound-communications-hadoop-consumer_tm1mwdpl04-1389222550693-8bc34b77],
> outbound-communications:108: fetched offset = 1689: consumed offset = 1689
> selected new offset 1689
> {2014-01-10 00:58:09,284} DEBUG [catalina-exec-3-SendThread(
> tvip-m1-mw-zookeeper.dish.com:2181)] (ClientCnxn.java:839) - Reading
> reply sessionid:0x1434b49cf56383b, packet:: clientPath:null serverPath:null
> finished:false header:: 279,4  replyHeader:: 279,51539617506,0  request::
> '/consumers/outbound-call-attempted-hadoop-consumer/offsets/outbound-call-attempted/14,F
>  response::
> #30,s{39860186414,39860186414,1387517714994,1387517714994,0,0,0,0,1,0,39860186414}
> {2014-01-10 00:58:09,285} INFO
>  
> [outbound-communications-hadoop-consumer_tm1mwdpl04-1389222550693-8bc34b77_watcher_executor]
> (?:?) -
> [outbound-communications-hadoop-consumer_tm1mwdpl04-1389222550693-8bc34b77],
> outbound-communications-hadoop-consumer_tm1mwdpl04-1389222550693-8bc34b77-3
> attempting to claim partition 109
> {2014-01-10 00:58:09,284} DEBUG
> [bill-generated-hadoop-consumer_tm1mwdpl04-1389222547995-29a6dce9_watcher_executor]
> (?:?) - initial fetch offset of bill-generated:11: fetched offset = 152:
> consumed offset = 152 is 152
> {2014-01-10 00:58:09,284} DEBUG [catalina-exec-12-SendThread(
> tvip-m1-mw-zookeeper.dish.com:2181)] (ClientCnxn.java:839) - Reading
> reply sessionid:0x1437b2879870005, packet:: clientPath:null serverPath:null
> finished:false header:: 619,1  replyHeader:: 619,51539617508,0  request::
> '/consumers/account-activated-hadoop-consumer/owners/account-activated/68,#6163636f756e742d6163746976617465642d6861646f6f702d636f6e73756d65725f746d316d7764706c30342d313338393232323535373930362d35363262363733382d30,v{s{31,s{'world,'anyone}}},1
>  response::
> '/consumers/account-activated-hadoop-consumer/owners/account-activated/68
> {2014-01-10 00:58:09,284} DEBUG
> [account-access-hadoop-consumer_tm1mwdpl04-1389222551916-a0c87abc_watcher_executor]
> (?:?) - [account-access-hadoop-consumer_tm1mwdpl04-1389222551916-a0c87abc],
> account-access:27: fetched offset = 9655: consumed offset = 9655 selected
> new offset 9655
> {2014-01-10 00:58:09,284} INFO
>  
> [account-info-updated-hadoop-consumer_tm1mwdpl04-1389222553159-ad59660b_watcher_executor]
> (?:?) -
> [account-info-updated-hadoop-consumer_tm1mwdpl04-1389222553159-ad59660b-leader-finder-thread],
> Shutdown completed
> {2014-01-10 00:58:09,284} INFO
>  
> [account-info-updated-hadoop-consumer_tm1mwdpl04-1389222553159-ad59660b-leader-finder-thread]
> (?:?) -
> [account-info-updated-hadoop-consumer_tm1mwdpl04-1389222553159-ad59660b-leader-finder-thread],
> Stopped
> {2014-01-10 00:58:09,285} INFO
>  
> [account-info-updated-hadoop-consumer_tm1mwdpl04-1389222553159-ad59660b_watcher_executor]
> (?:?) - [ConsumerFetcherManager-1389222553163] Stopping all fetchers
>
>


-- 
-- Guozhang

Reply via email to