Hi, all
I tried to test the QJM HA and it always works good. But, yestoday I met an
quite long time fail over with QJM. The test is base on the CDH4.3.0.
The attachment is the standby namenode and the journalnode 's logs.
The network cable on active namenode(also a datanode) was pulled out at
about 07:24. From the standby-namenode log I found log like this:
2013-08-28 07:24:51,122 INFO
org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 1
Total time for transactions(ms): 1Number of transactions batched in Syncs:
0 Number of syncs: 0 SyncTimes(ms): 0 41 42
2013-08-28 07:36:14,028 INFO
org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions:
32 Total time for transactions(ms): 3Number of transactions batched in
Syncs: 0 Number of syncs: 1 SyncTimes(ms): 9 49 46

The information seems regular. The problem is that between the 2 lines
there's no log  in 12 minutes. There is no long gc happened. It seems the
code blocked somewhere. Unfortunately, I forgot to print the jstack info
T_T.

Hope for your response.

Best regards,
Mickey

Reply via email to