We found that mirrormaker stopped consuming and producing over the week end (09/01). Just seeing "Client session timed out" messages in mirrormaker log. I restarted to it today 09/03 to resume processing. Here is the logs line in reverse order.
2013-09-03 14:20:40,918 [mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1378218012575-6779d506_watcher_executor] INFO (kafka.utils.VerifiableProperties) - Verifying properties 2013-09-03 14:20:40,877 [mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1378218012575-6779d506_watcher_executor] INFO (kafka.consumer.ZookeeperConsumerConnector) - [mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1378218012575-6779d506], begin rebalancing consumer mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1378218012575-6779d506 try #1 2013-09-03 14:20:38,877 [mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1378218012575-6779d506_watcher_executor] INFO (kafka.consumer.ZookeeperConsumerConnector) - [mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1378218012575-6779d506], Committing all offsets after clearing the fetcher queues 2013-09-03 14:20:38,877 [mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1378218012575-6779d506_watcher_executor] INFO (kafka.consumer.ZookeeperConsumerConnector) - [mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1378218012575-6779d506], Cleared the data chunks in all the consumer message iterators 2013-09-03 14:20:38,877 [mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1378218012575-6779d506_watcher_executor] INFO (kafka.consumer.ZookeeperConsumerConnector) - [mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1378218012575-6779d506], Cleared all relevant queues for this fetcher 2013-09-03 14:20:38,877 [mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1378218012575-6779d506_watcher_executor] INFO (kafka.consumer.ConsumerFetcherManager) - [ConsumerFetcherManager-1378218012760] All connections stopped 2013-09-03 14:20:38,877 [mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1378218012575-6779d506_watcher_executor] INFO (kafka.consumer.ConsumerFetcherManager) - [ConsumerFetcherManager-1378218012760] Stopping all fetchers 2013-09-03 14:20:38,877 [mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1378218012575-6779d506_watcher_executor] INFO (kafka.consumer.ConsumerFetcherManager) - [ConsumerFetcherManager-1378218012760] Stopping leader finder thread 2013-09-03 14:20:38,877 [mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1378218012575-6779d506_watcher_executor] INFO (kafka.consumer.ZookeeperConsumerConnector) - [mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1378218012575-6779d506], Rebalancing attempt failed. Clearing the cache before the next rebalancing operation is triggered 2013-09-03 14:20:38,876 [mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1378218012575-6779d506_watcher_executor] INFO (kafka.consumer.ZookeeperConsumerConnector) - [mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1378218012575-6779d506], end rebalancing consumer mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1378218012575-6779d506 try #0 2013-09-01 05:59:29,069 [main-SendThread( mandm-zookeeper-asg.data.sfdc.net:2181)] INFO (org.apache.zookeeper.ClientCnxn) - Socket connection established to mandm-zookeeper-asg.data.sfdc.net/10.228.48.38:2181, initiating session 2013-09-01 05:59:29,069 [main-SendThread( mandm-zookeeper-asg.data.sfdc.net:2181)] INFO (org.apache.zookeeper.ClientCnxn) - Opening socket connection to server mandm-zookeeper-asg.data.sfdc.net/10.228.48.38:2181 2013-09-01 05:59:27,792 [main-EventThread] INFO (org.I0Itec.zkclient.ZkClient) - zookeeper state changed (Disconnected) 2013-09-01 05:59:27,692 [main-SendThread( mandm-zookeeper-asg.data.sfdc.net:2181)] INFO (org.apache.zookeeper.ClientCnxn) - Client session timed out, have not heard from server in 4002ms for sessionid 0x140c603da5b0032, closing socket connection and attempting reconnect As you can see, no log lines appeared after 2013-09-01 05:59:29. I checked lag using consumerOffsetChecker and observed that log size and lag is growing, but offset of mirrormaker remains same. We have two mirrormaker process running and both of them had same issue during same time frame.. Any hint on what could be problem..? How do we go about trouble shooting this..? Thanks in advance.. -- Thanks, Raja.