We found that mirrormaker stopped consuming and producing over the week end
(09/01). Just seeing "Client session timed out" messages in mirrormaker
log. I restarted to it today 09/03 to resume processing. Here is the logs
line in reverse order.


2013-09-03 14:20:40,918
[mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1378218012575-6779d506_watcher_executor]
INFO  (kafka.utils.VerifiableProperties)  - Verifying properties
2013-09-03 14:20:40,877
[mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1378218012575-6779d506_watcher_executor]
INFO  (kafka.consumer.ZookeeperConsumerConnector)  -
[mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1378218012575-6779d506],
begin rebalancing consumer
mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1378218012575-6779d506 try #1
2013-09-03 14:20:38,877
[mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1378218012575-6779d506_watcher_executor]
INFO  (kafka.consumer.ZookeeperConsumerConnector)  -
[mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1378218012575-6779d506],
Committing all offsets after clearing the fetcher queues
2013-09-03 14:20:38,877
[mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1378218012575-6779d506_watcher_executor]
INFO  (kafka.consumer.ZookeeperConsumerConnector)  -
[mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1378218012575-6779d506],
Cleared the data chunks in all the consumer message iterators
2013-09-03 14:20:38,877
[mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1378218012575-6779d506_watcher_executor]
INFO  (kafka.consumer.ZookeeperConsumerConnector)  -
[mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1378218012575-6779d506],
Cleared all relevant queues for this fetcher
2013-09-03 14:20:38,877
[mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1378218012575-6779d506_watcher_executor]
INFO  (kafka.consumer.ConsumerFetcherManager)  -
[ConsumerFetcherManager-1378218012760] All connections stopped
2013-09-03 14:20:38,877
[mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1378218012575-6779d506_watcher_executor]
INFO  (kafka.consumer.ConsumerFetcherManager)  -
[ConsumerFetcherManager-1378218012760] Stopping all fetchers
2013-09-03 14:20:38,877
[mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1378218012575-6779d506_watcher_executor]
INFO  (kafka.consumer.ConsumerFetcherManager)  -
[ConsumerFetcherManager-1378218012760] Stopping leader finder thread
2013-09-03 14:20:38,877
[mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1378218012575-6779d506_watcher_executor]
INFO  (kafka.consumer.ZookeeperConsumerConnector)  -
[mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1378218012575-6779d506],
Rebalancing attempt failed. Clearing the cache before the next rebalancing
operation is triggered
2013-09-03 14:20:38,876
[mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1378218012575-6779d506_watcher_executor]
INFO  (kafka.consumer.ZookeeperConsumerConnector)  -
[mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1378218012575-6779d506], end
rebalancing consumer
mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1378218012575-6779d506 try #0
2013-09-01 05:59:29,069 [main-SendThread(
mandm-zookeeper-asg.data.sfdc.net:2181)] INFO
 (org.apache.zookeeper.ClientCnxn)  - Socket connection established to
mandm-zookeeper-asg.data.sfdc.net/10.228.48.38:2181, initiating session
2013-09-01 05:59:29,069 [main-SendThread(
mandm-zookeeper-asg.data.sfdc.net:2181)] INFO
 (org.apache.zookeeper.ClientCnxn)  - Opening socket connection to server
mandm-zookeeper-asg.data.sfdc.net/10.228.48.38:2181
2013-09-01 05:59:27,792 [main-EventThread] INFO
 (org.I0Itec.zkclient.ZkClient)  - zookeeper state changed (Disconnected)
2013-09-01 05:59:27,692 [main-SendThread(
mandm-zookeeper-asg.data.sfdc.net:2181)] INFO
 (org.apache.zookeeper.ClientCnxn)  - Client session timed out, have not
heard from server in 4002ms for sessionid 0x140c603da5b0032, closing socket
connection and attempting reconnect


As you can see, no log lines appeared after 2013-09-01 05:59:29. I checked
lag using consumerOffsetChecker and observed that log size and lag is
growing, but offset of mirrormaker remains same. We have two mirrormaker
process running and both of them had same issue during same time frame..
Any hint on what could be problem..? How do we go about trouble shooting
this..?

Thanks in advance..

-- 
Thanks,
Raja.

Reply via email to