Yonghwan Kim created HDFS-4945: ---------------------------------- Summary: A Distributed and Cooperative NameNode Cluster for a Highly-Available HDFS Key: HDFS-4945 URL: https://issues.apache.org/jira/browse/HDFS-4945 Project: Hadoop HDFS Issue Type: New Feature Components: auto-failover Affects Versions: HA branch (HDFS-1623) Reporter: Yonghwan Kim
Recently, Hadoop attracts much attention of engineers and researchers as an emerging and effective framework for Big Data. HDFS(Hadoop Distributed File System) can manage huge amount of data with guaranteeing high performance and reliability with only commodity hardware. However, HDFS requires a single master node, called NameNode, to manage the entire namespace (or all the i-nodes) of a file system. This causes SPOF (Single Point Of Failure) problem because the file system becomes inaccessible when the NameNode fails. (HDFS-2064) This also causes a bottleneck of efficiency since all the access requests to the file system have to contact the NameNode. Hadoop 2.0 resolves the SPOF problem by introducing manual failover based on two NameNodes, Active and Standby. However, it still has the efficiency bottleneck problem since all the access requests have to contact the Active in ordinary executions. It may also lose an advantage of using commodity hardware since the two NameNodes have to share a highly-reliable sophisticated storage. We here propose a new HDFS architecture to resolve all the problems mentioned above. The proposed architecture has the following features and advantages. 1. Multiple NameNodes (not restricted to two) can be utilized to improve availability. The entire namespace of a file system is partitioned into several fragments, and replicas of each fragment are dispersed among the NameNodes. When each fragment has k replicas, the file system can tolerate up to floor(k/2 - 1) faulty NameNodes. 2. Multiple NameNodes can be utilized to improve performance. The performance bottleneck caused by a single NameNode can be circumvented by assigning different NameNodes to different fragments as the primary ones (or the entry points). 3. The highly-reliable storage shared by the NameNodes is removed by introducing message-based consistency mechanism among the NameNodes. The architecture requires only commodity hardware. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira