Guo Wei created HDFS-17769:
------------------------------

             Summary: Allows client to actively retry to Active NameNode when 
the Observer NameNode is too far behind client state id.
                 Key: HDFS-17769
                 URL: https://issues.apache.org/jira/browse/HDFS-17769
             Project: Hadoop HDFS
          Issue Type: Improvement
          Components: namenode
    Affects Versions: 3.4.1, 3.3.6, 3.3.4
            Reporter: Guo Wei
             Fix For: 3.4.2
         Attachments: 1.png, 2.png, 3.png

When we use Router to forward read requests to the observer, if the cluster 
experiences heavy write workloads, Observer nodes may fail to keep pace with 
edit log synchronization, even if the dfs.ha.tail-edits.in-progress parameter 
is configured, it may still occur.
This triggers RetriableException: Observer Node is too far behind errors. 
Especially when the client ipc.client.ping parameter is set to true, it will 
strive to wait and constantly retry, which can cause the business to be unable 
to obtain the desired data timely. We should consider having the active 
namenode handle this at this time.

Here are our some errors and repair verification:

The stateid of the observer is too far behind the active:1.png

 

RetriableException:2.png

 

repair verification : 3.png

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Reply via email to