---------- Forwarded message ---------- From: Anty <anty....@gmail.com> Date: Sun, Apr 7, 2013 at 5:48 PM Subject: Probably a bug in FsEditLog To: hdfs-dev@hadoop.apache.org, Chance Li <chanc...@gmail.com>
Hi:ALL In our cluster, we configure the NameNode to write to both local and NFS mounted directories. When the NFS mounted directory is inaccessible, the NameNode should keep running without error, but our namenode crash with following stack trace. 2013-04-02 23:35:21,535 WARN org.apache.hadoop.hdfs.server.common.Storage: > Removing storage dir /nfs2-mount/onest3/dfs/name > 2013-04-02 23:35:21,536 FATAL > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Unable to find edits > stream with IO error > java.lang.Exception: Unable to find edits stream with IO error > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.fatalExit(FSEditLog.java:430) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.removeEditsStreamsAndStorageDirs(FSEditLog.java:519) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.logSync(FSEditLog.java:1139) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:1641) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.complete(NameNode.java:689) > at sun.reflect.GeneratedMethodAccessor21.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1434) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1430) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1428) > 2013-04-02 23:35:21,539 INFO > org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG: > According to the stack trace, When NameNode tries to sync edit log, it does identify the mounted NFS directory is inaccessible, and attempt to remove it from the FSEditLog#editStreams. However, it found the edit stream corresponding to the mounted NFS has already been removed. Under this circumstance, NameNode just kill itself, aborted! After looking through the source code of HDFS, I found there is another code path of removing edit stream from FSEditLog#editStreams, which can cause above race condition. In method FSEditLog#logEdit > if (getNumEditStreams() < 1) > { > throw new AssertionError("No edit streams to log to"); > } > long start = FSNamesystem.now(); > for (int idx = 0; idx < editStreams.size(); idx++) > { > EditLogOutputStream eStream = editStreams.get(idx); > try > { > eStream.write(op, writables); > } > catch (IOException ioe) > { > removeEditsAndStorageDir(idx); > idx--; > } > } > > The cause of this race condition lie in FSEditLog#logSync method, there are two steps in FSEditLog#logSync 1. Do sync operation, if one edit stream is accessible, put it into error stream list.(un-synchronized) 2. Delete error stream from FSEditLog#editStreams(synchronized) Step #1 isn’t synchronized, so there is a possibility that after step#1 and before step #2 the error stream has already been removed from other thread by invoking FSEditLog#logEdit If this was exactly a bug, my proposal fix is in method FSEditlog#logsync ignore or print some warning message instead of aborting the namenode when error edit stream doesn't exist in FSEditLog#editStreams -- Best Regards Anty Rao -- Best Regards Anty Rao