Hi Dmitry, Have you looked into the QJM automatic failover mode using the ZKFailoverController? https://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html#Automatic_Failover This is the most commonly used HA mode in production environments. Also there is some recent work that will be in Hadoop 3 that will allow to have more than 1 stand-by NNs: https://issues.apache.org/jira/browse/HDFS-6440
cheers, esteban. -- Cloudera, Inc. On Thu, Jul 2, 2015 at 7:42 AM, Dmitry Salychev <darkness....@gmail.com> wrote: > Sure, I did. It's actually not what I'm looking for. I don't want to spend > time to make dead NN alive by my hands. There should be a solution for > NN-SPOF problem. > > > On 07/02/2015 04:36 PM, Vinayakumar B wrote: > >> Hi.. >> Did you look at the HDFS Namenode high availability? >> >> -Vinay >> On Jul 2, 2015 11:50 AM, "Dmitry Salychev" <darkness....@gmail.com> >> wrote: >> >> Hello, HDFS Developers. >>> >>> I know that NN is a single point of failure of an entire HDFS cluster. If >>> it fails, the cluster will be unavailable no matter how many DN there. I >>> know that there is an initiative < >>> >>> http://www.wandisco.com/system/files/documentation/Meetup-ConsensusReplication.pdf >>> > >>> which introduces ConsensusNode (as I can see it looks like distributed >>> NN) >>> and related issues (HDFS-6469 < >>> https://issues.apache.org/jira/browse/HDFS-6469>, HADOOP-10641 < >>> https://issues.apache.org/jira/browse/HADOOP-10641> and HDFS-7007 < >>> https://issues.apache.org/jira/browse/HDFS-7007>). So, I'd like to ask. >>> >>> Has this NN-SPOF problem been solved? If it hasn't, can you show me an >>> entry point where I can help to solve it? >>> >>> Thanks for your time. >>> >>> >>> >>> >