Hi Devs, Anyone meet NameNode fatal since editlog not sync in time then lead to process unexpectedly exit. I have met several times and try to dig the root cause but no result. This JIRA(HDFS-10943 <https://issues.apache.org/jira/browse/HDFS-10943>) is trace this issue. welcome any suggestions and more discuss.
I would like to offer some more information: Environment: branch-2.7 HDFS HA using QJM FATAL Logļ¼ > 2019-03-08 10:53:17,111 FATAL > org.apache.hadoop.hdfs.server.namenode.FSEditLog: Error: finalize log > segment 117479864532, 117479959500 failed for required journal > (JournalAndStream(mgr=QJM to [journalhosts], stream=QuorumOutputStream > starting at txid 117479864532)) > java.io.IOException: FSEditStream has 141 bytes still to be flushed and > cannot be closed. > at > org.apache.hadoop.hdfs.server.namenode.EditsDoubleBuffer.close(EditsDoubleBuffer.java:66) > at > org.apache.hadoop.hdfs.qjournal.client.QuorumOutputStream.close(QuorumOutputStream.java:65) > at > org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalAndStream.closeStream(JournalSet.java:115) > at > org.apache.hadoop.hdfs.server.namenode.JournalSet$4.apply(JournalSet.java:235) > at > org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:393) > at > org.apache.hadoop.hdfs.server.namenode.JournalSet.finalizeLogSegment(JournalSet.java:231) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.endCurrentLogSegment(FSEditLog.java:1274) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.rollEditLog(FSEditLog.java:1203) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.rollEditLog(FSImage.java:1331) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.rollEditLog(FSNamesystem.java:6102) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rollEditLog(NameNodeRpcServer.java:1297) > at > org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolServerSideTranslatorPB.rollEditLog(NamenodeProtocolServerSideTranslatorPB.java:142) > at > org.apache.hadoop.hdfs.protocol.proto.NamenodeProtocolProtos$NamenodeProtocolService$2.callBlockingMethod(NamenodeProtocolProtos.java:12025) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:976) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:847) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:790) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1686) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2458) Best Regards, Hexiaoqiao