kuper created HADOOP-18937:
------------------------------

             Summary: Add journalnode maintenance node list
                 Key: HADOOP-18937
                 URL: https://issues.apache.org/jira/browse/HADOOP-18937
             Project: Hadoop Common
          Issue Type: Improvement
          Components: fs
    Affects Versions: 3.3.6
            Reporter: kuper


* In the case of configuring 3 journal nodes in HDFS, if only 2 journal nodes 
are available and 1 journal node fails to start due to machine issues, it will 
result in a long initialization time for the namenode (around 30-40 minutes, 
depending on the IPC timeout and retry policy configuration). 
* The failed journal node cannot recover immediately, but HDFS can still 
function in this situation. In our production environment, we encountered this 
issue and had to reduce the IPC timeout and adjust the retry policy to 
accelerate the namenode initialization and provide services. 
* I'm wondering if it would be possible to have a journal node maintenance list 
to speed up the namenode initialization knowing that one journal node cannot 
provide services in advance?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

Reply via email to