2013-09-01 05:59:27,792 [main-EventThread] INFO
 (org.I0Itec.zkclient.ZkClient)  - zookeeper state changed (Disconnected)
2013-09-01 05:59:27,692 [main-SendThread(
mandm-zookeeper-asg.data.sfdc.net:2181)] INFO
 (org.apache.zookeeper.
ClientCnxn)  - Client session timed out, have not
heard from server in 4002ms for sessionid 0x140c603da5b0032, closing socket
connection and attempting reconnect

This indicates that your mirror maker and/or your zookeeper cluster is
GCing for long periods of time. I have observed that if "client session
timed out" happens too many times, the client tends to lose zookeeper
watches. This is a potential bug in zookeeper. If this happens, your mirror
maker instance might not rebalance correctly and will start losing data.

You mentioned consumption/production stopped on your mirror maker, could
you please take a thread dump and point us to it? Meanwhile, you might want
to fix the GC pauses.

Thanks,
Neha


On Tue, Sep 3, 2013 at 8:59 AM, Rajasekar Elango <rela...@salesforce.com>wrote:

> We found that mirrormaker stopped consuming and producing over the week end
> (09/01). Just seeing "Client session timed out" messages in mirrormaker
> log. I restarted to it today 09/03 to resume processing. Here is the logs
> line in reverse order.
>
>
> 2013-09-03 14:20:40,918
>
> [mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1378218012575-6779d506_watcher_executor]
> INFO  (kafka.utils.VerifiableProperties)  - Verifying properties
> 2013-09-03 14:20:40,877
>
> [mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1378218012575-6779d506_watcher_executor]
> INFO  (kafka.consumer.ZookeeperConsumerConnector)  -
> [mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1378218012575-6779d506],
> begin rebalancing consumer
> mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1378218012575-6779d506 try #1
> 2013-09-03 14:20:38,877
>
> [mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1378218012575-6779d506_watcher_executor]
> INFO  (kafka.consumer.ZookeeperConsumerConnector)  -
> [mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1378218012575-6779d506],
> Committing all offsets after clearing the fetcher queues
> 2013-09-03 14:20:38,877
>
> [mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1378218012575-6779d506_watcher_executor]
> INFO  (kafka.consumer.ZookeeperConsumerConnector)  -
> [mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1378218012575-6779d506],
> Cleared the data chunks in all the consumer message iterators
> 2013-09-03 14:20:38,877
>
> [mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1378218012575-6779d506_watcher_executor]
> INFO  (kafka.consumer.ZookeeperConsumerConnector)  -
> [mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1378218012575-6779d506],
> Cleared all relevant queues for this fetcher
> 2013-09-03 14:20:38,877
>
> [mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1378218012575-6779d506_watcher_executor]
> INFO  (kafka.consumer.ConsumerFetcherManager)  -
> [ConsumerFetcherManager-1378218012760] All connections stopped
> 2013-09-03 14:20:38,877
>
> [mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1378218012575-6779d506_watcher_executor]
> INFO  (kafka.consumer.ConsumerFetcherManager)  -
> [ConsumerFetcherManager-1378218012760] Stopping all fetchers
> 2013-09-03 14:20:38,877
>
> [mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1378218012575-6779d506_watcher_executor]
> INFO  (kafka.consumer.ConsumerFetcherManager)  -
> [ConsumerFetcherManager-1378218012760] Stopping leader finder thread
> 2013-09-03 14:20:38,877
>
> [mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1378218012575-6779d506_watcher_executor]
> INFO  (kafka.consumer.ZookeeperConsumerConnector)  -
> [mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1378218012575-6779d506],
> Rebalancing attempt failed. Clearing the cache before the next rebalancing
> operation is triggered
> 2013-09-03 14:20:38,876
>
> [mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1378218012575-6779d506_watcher_executor]
> INFO  (kafka.consumer.ZookeeperConsumerConnector)  -
> [mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1378218012575-6779d506], end
> rebalancing consumer
> mirrormakerProd_ops-mmrs1-1-asg.ops.sfdc.net-1378218012575-6779d506 try #0
> 2013-09-01 05:59:29,069 [main-SendThread(
> mandm-zookeeper-asg.data.sfdc.net:2181)] INFO
>  (org.apache.zookeeper.ClientCnxn)  - Socket connection established to
> mandm-zookeeper-asg.data.sfdc.net/10.228.48.38:2181, initiating session
> 2013-09-01 05:59:29,069 [main-SendThread(
> mandm-zookeeper-asg.data.sfdc.net:2181)] INFO
>  (org.apache.zookeeper.ClientCnxn)  - Opening socket connection to server
> mandm-zookeeper-asg.data.sfdc.net/10.228.48.38:2181
> 2013-09-01 05:59:27,792 [main-EventThread] INFO
>  (org.I0Itec.zkclient.ZkClient)  - zookeeper state changed (Disconnected)
> 2013-09-01 05:59:27,692 [main-SendThread(
> mandm-zookeeper-asg.data.sfdc.net:2181)] INFO
>  (org.apache.zookeeper.ClientCnxn)  - Client session timed out, have not
> heard from server in 4002ms for sessionid 0x140c603da5b0032, closing socket
> connection and attempting reconnect
>
>
> As you can see, no log lines appeared after 2013-09-01 05:59:29. I checked
> lag using consumerOffsetChecker and observed that log size and lag is
> growing, but offset of mirrormaker remains same. We have two mirrormaker
> process running and both of them had same issue during same time frame..
> Any hint on what could be problem..? How do we go about trouble shooting
> this..?
>
> Thanks in advance..
>
> --
> Thanks,
> Raja.
>

Reply via email to