That was an interesting section too.  Which GC settings would you suggest?

Thank you,
- charlie

> On Jan 10, 2014, at 10:11 PM, Jun Rao <jun...@gmail.com> wrote:
> 
> Have you looked at our FAQ, especially
> https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-Whyaretheremanyrebalancesinmyconsumerlog
> ?
> 
> Thanks,
> 
> Jun
> 
> 
> On Fri, Jan 10, 2014 at 2:25 PM, Seshadri, Balaji
> <balaji.sesha...@dish.com>wrote:
> 
>> Any clue would be helpful.
>> 
>> -----Original Message-----
>> From: Seshadri, Balaji [mailto:balaji.sesha...@dish.com]
>> Sent: Friday, January 10, 2014 12:46 PM
>> To: users@kafka.apache.org
>> Subject: RE: Looks like consumer fetchers get stopped we are not getting
>> any data
>> 
>> Yes rebalance begins and exceptions occurs.
>> 
>> 
>> {2014-01-10 00:58:11,293} INFO
>> [account-info-updated-hadoop-consumer_tm1mwdpl04-1389222553159-ad59660b_watcher_executor]
>> (?:?) - [account-i
>> nfo-updated-hadoop-consumer_tm1mwdpl04-1389222553159-ad59660b], Cleared the
>> data chunks in all the consumer message iterators
>> {2014-01-10 00:58:11,293} INFO
>> [account-info-updated-hadoop-consumer_tm1mwdpl04-1389222553159-ad59660b_watcher_executor]
>> (?:?) - [account-i
>> nfo-updated-hadoop-consumer_tm1mwdpl04-1389222553159-ad59660b], Committing
>> all offsets after clearing the fetcher queues
>> {2014-01-10 00:58:11,298} DEBUG [catalina-exec-12-SendThread(
>> tvip-m1-mw-zookeeper.dish.com:2181)] (ClientCnxn.java:759) - Got ping
>> response for sessionid: 0x1437b2879870005 af ter 0ms
>> {2014-01-10 00:58:11,313} INFO
>> [account-activated-hadoop-consumer_tm1mwdpl04-1389222557906-562b6738_watcher_executor]
>> (?:?) -
>> [account-activated-hadoop-consumer_tm1mwdpl04-1389222557906-562b6738],
>> begin rebalancing consumer
>> account-activated-hadoop-consumer_tm1mwdpl04-1389222557906-562b6738 try #1
>> {2014-01-10 00:58:11,314} DEBUG [catalina-exec-12-SendThread(
>> tvip-m1-mw-zookeeper.dish.com:2181)] (ClientCnxn.java:839) - Reading
>> reply sessionid:0x1437b2879870005, packet:: clientPath:null serverPath:null
>> finished:false header:: 627,8  replyHeader:: 627,51539619966,0  request::
>> '/brokers/ids,F  response:: v{'1}
>> {2014-01-10 00:58:11,315} DEBUG [catalina-exec-12-SendThread(
>> tvip-m1-mw-zookeeper.dish.com:2181)] (ClientCnxn.java:839) - Reading
>> reply sessionid:0x1437b2879870005, packet:: clientPath:null serverPath:null
>> finished:false header:: 628,4  replyHeader:: 628,51539619966,0  request::
>> '/brokers/ids/1,F  response::
>> #7b2022686f7374223a22746d312d6b61666b6162726f6b6572313031222c20226a6d785f706f7274223a393939392c2022706f7274223a393039322c202276657273696f6e223a31207d,s{47244644685,47244644685,1388537628753,1388537628753,0,0,0,163056791896588316,74,0,47244644685}
>> {2014-01-10 00:58:11,316} DEBUG [catalina-exec-12-SendThread(
>> tvip-m1-mw-zookeeper.dish.com:2181)] (ClientCnxn.java:839) - Reading
>> reply sessionid:0x1437b2879870005, packet:: clientPath:null serverPath:null
>> finished:false header:: 629,4  replyHeader:: 629,51539619966,-101
>> request::
>> '/consumers/account-activated-hadoop-consumer/ids/account-activated-hadoop-consumer_tm1mwdpl04-1389222557906-562b6738,F
>> response::
>> {2014-01-10 00:58:11,316} INFO
>> [account-activated-hadoop-consumer_tm1mwdpl04-1389222557906-562b6738_watcher_executor]
>> (?:?) -
>> [account-activated-hadoop-consumer_tm1mwdpl04-1389222557906-562b6738],
>> exception during rebalance
>> org.I0Itec.zkclient.exception.ZkNoNodeException:
>> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode =
>> NoNode for
>> /consumers/account-activated-hadoop-consumer/ids/account-activated-hadoop-consumer_tm1mwdpl04-1389222557906-562b6738
>>        at
>> org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:47)
>>        at
>> org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:685)
>>        at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:766)
>>        at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:761)
>>        at kafka.utils.ZkUtils$.readData(Unknown Source)
>>        at kafka.consumer.TopicCount$.constructTopicCount(Unknown Source)
>>        at
>> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.kafka$consumer$ZookeeperConsumerConnector$ZKRebalancerListener$$rebalance(Unknown
>> Source)
>>        at
>> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$syncedRebalance$1.apply$mcVI$sp(Unknown
>> Source)
>>        at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:142)
>>        at
>> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.syncedRebalance(Unknown
>> Source)
>>        at
>> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anon$1.run(Unknown
>> Source) Caused by: org.apache.zookeeper.KeeperException$NoNodeException:
>> KeeperErrorCode = NoNode for
>> /consumers/account-activated-hadoop-consumer/ids/account-activated-hadoop-consumer_tm1mwdpl04-1389222557906-562b6738
>>        at
>> org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
>>        at
>> org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
>>        at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:927)
>>        at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:956)
>>        at org.I0Itec.zkclient.ZkConnection.readData(ZkConnection.java:103)
>> :
>> -----Original Message-----
>> From: Guozhang Wang [mailto:wangg...@gmail.com]
>> Sent: Friday, January 10, 2014 12:30 PM
>> To: users@kafka.apache.org
>> Subject: Re: Looks like consumer fetchers get stopped we are not getting
>> any data
>> 
>> Actually, the broken channel is broken by shutting down the
>> leader-finder-thread, which is shutdown either by a rebalance retry or
>> shutting down the consumer.
>> 
>> Do you see "begin rebalance ..." before this log entry? And if yes, search
>> to see if the rebalance keep failing.
>> 
>> Guozhang
>> 
>> 
>> On Fri, Jan 10, 2014 at 11:23 AM, Guozhang Wang <wangg...@gmail.com>
>> wrote:
>> 
>>> From your logs the channel with the brokers are broken, are the
>>> brokers alive at that time?
>>> 
>>> Guozhang
>>> 
>>> 
>>> On Fri, Jan 10, 2014 at 10:52 AM, Withers, Robert
>>> <robert.with...@dish.com
>>>> wrote:
>>> 
>>>> The core problem is our consumers stop consuming and lag increases.
>>>> We found this blog:
>> https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-Myconsumerseemstohavestopped,why
>> ?.
>>>> This lists 3 possibilities.
>>>> 
>>>> The blog also talks earlier about spurious rebalances, due to
>>>> improper GC settings, but we couldn't find what GC settings to use.
>>>> We are considering changing the zookeeper timeouts.  We are a little
>>>> confused about the various issues, the sequence of issues and what
>>>> could cause the consumers to stop reading.  If the fetchers get
>>>> shutdown, due to a ClosedByInterruptException in the "leader_finder"
>>>> thread, which tells the "executor_watcher" thread to shutdown the
>>>> fetchers, that would be another reason the consumers stop processing
>> data.  Is this possible?
>>>> 
>>>> Thank you,
>>>> rob
>>>> 
>>>> -----Original Message-----
>>>> From: Seshadri, Balaji [mailto:balaji.sesha...@dish.com]
>>>> Sent: Friday, January 10, 2014 11:40 AM
>>>> To: users@kafka.apache.org
>>>> Subject: RE: Looks like consumer fetchers get stopped we are not
>>>> getting any data
>>>> 
>>>> It would be helpful if you guys can shed some light why all fetchers
>>>> are getting stopped.
>>>> 
>>>> -----Original Message-----
>>>> From: Seshadri, Balaji [mailto:balaji.sesha...@dish.com]
>>>> Sent: Friday, January 10, 2014 11:28 AM
>>>> To: users@kafka.apache.org
>>>> Subject: RE: Looks like consumer fetchers get stopped we are not
>>>> getting any data
>>>> 
>>>> We also got the below error when this happens.
>>>> 
>>>> {2014-01-10 00:58:11,292} INFO
>>>> 
>>>> [account-info-updated-hadoop-consumer_tm1mwdpl04-1389222553159-ad5966
>>>> 0b_watcher_executor]
>>>> (?:?) -
>>>> [account-info-updated-hadoop-consumer_tm1mwdpl04-1389222553159-ad5966
>>>> 0b],
>>>> exception during rebalance
>>>> org.I0Itec.zkclient.exception.ZkNoNodeException:
>>>> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode
>>>> = NoNode for
>> /consumers/account-info-updated-hadoop-consumer/ids/account-info-updated-hadoop-consumer_tm1mwdpl04-1389222553159-ad59660b
>>>>        at
>>>> org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:47)
>>>>        at
>>>> org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:685)
>>>>        at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:766)
>>>>        at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:761)
>>>>        at kafka.utils.ZkUtils$.readData(Unknown Source)
>>>>        at kafka.consumer.TopicCount$.constructTopicCount(Unknown
>> Source)
>>>>        at
>>>> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.kafka$
>>>> consumer$ZookeeperConsumerConnector$ZKRebalancerListener$$rebalance(U
>>>> nknown
>>>> Source)
>>>>        at
>>>> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonf
>>>> un$syncedRebalance$1.apply$mcVI$sp(Unknown
>>>> Source)
>>>>        at
>>>> scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:142)
>>>>        at
>>>> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.synced
>>>> Rebalance(Unknown
>>>> Source)
>>>>        at
>>>> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anon$
>>>> 1.run(Unknown
>>>> Source) Caused by: org.apache.zookeeper.KeeperException$NoNodeException:
>>>> KeeperErrorCode = NoNode for
>> /consumers/account-info-updated-hadoop-consumer/ids/account-info-updated-hadoop-consumer_tm1mwdpl04-1389222553159-ad59660b
>>>>        at
>>>> org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
>>>>        at
>>>> org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
>>>>        at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:927)
>>>>        at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:956)
>>>>        at
>>>> org.I0Itec.zkclient.ZkConnection.readData(ZkConnection.java:103)
>>>>        at org.I0Itec.zkclient.ZkClient$9.call(ZkClient.java:770)
>>>>        at org.I0Itec.zkclient.ZkClient$9.call(ZkClient.java:766)
>>>>        at
>>>> org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675)
>>>>        ... 9 more
>>>> 
>>>> -----Original Message-----
>>>> From: Seshadri, Balaji [mailto:balaji.sesha...@dish.com]
>>>> Sent: Friday, January 10, 2014 10:52 AM
>>>> To: users@kafka.apache.org
>>>> Subject: Looks like consumer fetchers get stopped we are not getting
>>>> any data
>>>> 
>>>> Please let us know why we are not getting any data from Kafaka after
>>>> this log from Kafka,can you guys lets us know.
>>>> 
>>>> What could be causing all fetchers associated to be stooped why it is
>>>> not doing retry.
>>>> 
>>>> {2014-01-10 00:58:09,284} WARN
>>>> 
>>>> [account-info-updated-hadoop-consumer_tm1mwdpl04-1389222553159-ad5966
>>>> 0b-leader-finder-thread]
>>>> (?:?) - Fetching topic metadata with correlation id 3 for topics
>>>> [Set(account-info-updated)] from broker
>>>> [id:1,host:tm1-kafkabroker101,port:9092] failed
>>>> java.nio.channels.ClosedByInterruptException
>>>>        at
>> java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
>>>>        at
>> sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:506)
>>>>        at java.nio.channels.SocketChannel.write(SocketChannel.java:493)
>>>>        at kafka.network.BoundedByteBufferSend.writeTo(Unknown Source)
>>>>        at kafka.network.Send$class.writeCompletely(Unknown Source)
>>>>        at
>>>> kafka.network.BoundedByteBufferSend.writeCompletely(Unknown
>>>> Source)
>>>>        at kafka.network.BlockingChannel.send(Unknown Source)
>>>>        at kafka.producer.SyncProducer.liftedTree1$1(Unknown Source)
>>>>        at
>>>> kafka.producer.SyncProducer.kafka$producer$SyncProducer$$doSend(Unkno
>>>> wn
>>>> Source)
>>>>        at kafka.producer.SyncProducer.send(Unknown Source)
>>>>        at kafka.client.ClientUtils$.fetchTopicMetadata(Unknown Source)
>>>>        at kafka.client.ClientUtils$.fetchTopicMetadata(Unknown Source)
>>>>        at
>>>> kafka.consumer.ConsumerFetcherManager$LeaderFinderThread.doWork(Unkno
>>>> wn
>>>> Source)
>>>>        at kafka.utils.ShutdownableThread.run(Unknown Source)
>>>> {2014-01-10 00:58:09,284} DEBUG
>>>> [account-access-hadoop-consumer_tm1mwdpl04-1389222551916-a0c87abc_wat
>>>> cher_executor]
>>>> (?:?) - initial fetch offset of account-access:27: fetched offset =
>> 9655:
>>>> consumed offset = 9655 is 9655
>>>> {2014-01-10 00:58:09,284} DEBUG
>>>> [bill-generated-hadoop-consumer_tm1mwdpl04-1389222547995-29a6dce9_wat
>>>> cher_executor]
>>>> (?:?) - initial consumer offset of bill-generated:11: fetched offset =
>> 152:
>>>> consumed offset = 152 is 152
>>>> {2014-01-10 00:58:09,284} DEBUG
>>>> [outbound-communications-hadoop-consumer_tm1mwdpl04-1389222550693-8bc
>>>> 34b77_watcher_executor]
>>>> (?:?) -
>>>> [outbound-communications-hadoop-consumer_tm1mwdpl04-1389222550693-8bc
>>>> 34b77],
>>>> outbound-communications:108: fetched offset = 1689: consumed offset =
>>>> 1689 selected new offset 1689
>>>> {2014-01-10 00:58:09,284} DEBUG [catalina-exec-3-SendThread(
>>>> tvip-m1-mw-zookeeper.dish.com:2181)] (ClientCnxn.java:839) - Reading
>>>> reply sessionid:0x1434b49cf56383b, packet:: clientPath:null
>>>> serverPath:null finished:false header:: 279,4  replyHeader::
>> 279,51539617506,0  request::
>>>> '/consumers/outbound-call-attempted-hadoop-consumer/offsets/outbound-
>>>> call-attempted/14,F
>>>> response::
>>>> #30,s{39860186414,39860186414,1387517714994,1387517714994,0,0,0,0,1,0
>>>> ,39860186414}
>>>> {2014-01-10 00:58:09,285} INFO
>>>> 
>>>> [outbound-communications-hadoop-consumer_tm1mwdpl04-1389222550693-8bc
>>>> 34b77_watcher_executor]
>>>> (?:?) -
>>>> [outbound-communications-hadoop-consumer_tm1mwdpl04-1389222550693-8bc
>>>> 34b77],
>>>> outbound-communications-hadoop-consumer_tm1mwdpl04-1389222550693-8bc3
>>>> 4b77-3
>>>> attempting to claim partition 109
>>>> {2014-01-10 00:58:09,284} DEBUG
>>>> [bill-generated-hadoop-consumer_tm1mwdpl04-1389222547995-29a6dce9_wat
>>>> cher_executor]
>>>> (?:?) - initial fetch offset of bill-generated:11: fetched offset = 152:
>>>> consumed offset = 152 is 152
>>>> {2014-01-10 00:58:09,284} DEBUG [catalina-exec-12-SendThread(
>>>> tvip-m1-mw-zookeeper.dish.com:2181)] (ClientCnxn.java:839) - Reading
>>>> reply sessionid:0x1437b2879870005, packet:: clientPath:null
>>>> serverPath:null finished:false header:: 619,1  replyHeader::
>> 619,51539617508,0  request::
>>>> '/consumers/account-activated-hadoop-consumer/owners/account-activate
>>>> d/68,#6163636f756e742d6163746976617465642d6861646f6f702d636f6e73756d6
>>>> 5725f746d316d7764706c30342d313338393232323535373930362d35363262363733
>>>> 382d30,v{s{31,s{'world,'anyone}}},1
>>>> response::
>>>> '/consumers/account-activated-hadoop-consumer/owners/account-activate
>>>> d/68
>>>> {2014-01-10 00:58:09,284} DEBUG
>>>> [account-access-hadoop-consumer_tm1mwdpl04-1389222551916-a0c87abc_wat
>>>> cher_executor]
>>>> (?:?) -
>>>> [account-access-hadoop-consumer_tm1mwdpl04-1389222551916-a0c87abc],
>>>> account-access:27: fetched offset = 9655: consumed offset = 9655
>>>> selected new offset 9655
>>>> {2014-01-10 00:58:09,284} INFO
>>>> 
>>>> [account-info-updated-hadoop-consumer_tm1mwdpl04-1389222553159-ad5966
>>>> 0b_watcher_executor]
>>>> (?:?) -
>>>> [account-info-updated-hadoop-consumer_tm1mwdpl04-1389222553159-ad5966
>>>> 0b-leader-finder-thread],
>>>> Shutdown completed
>>>> {2014-01-10 00:58:09,284} INFO
>>>> 
>>>> [account-info-updated-hadoop-consumer_tm1mwdpl04-1389222553159-ad5966
>>>> 0b-leader-finder-thread]
>>>> (?:?) -
>>>> [account-info-updated-hadoop-consumer_tm1mwdpl04-1389222553159-ad5966
>>>> 0b-leader-finder-thread],
>>>> Stopped
>>>> {2014-01-10 00:58:09,285} INFO
>>>> 
>>>> [account-info-updated-hadoop-consumer_tm1mwdpl04-1389222553159-ad5966
>>>> 0b_watcher_executor]
>>>> (?:?) - [ConsumerFetcherManager-1389222553163] Stopping all fetchers
>>> 
>>> 
>>> --
>>> -- Guozhang
>> 
>> 
>> 
>> --
>> -- Guozhang
>> 
>> 

Reply via email to