Chen Liang created HDFS-14806:
---------------------------------

             Summary: Bootstrap standby may fail if used in-progress tailing
                 Key: HDFS-14806
                 URL: https://issues.apache.org/jira/browse/HDFS-14806
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: namenode
    Affects Versions: 3.3.0
            Reporter: Chen Liang
            Assignee: Chen Liang


One issue we went across was that if in-progress tailing is enabled, bootstrap 
standby could fail.

When in-progress tailing is enabled, Bootstrap uses the RPC mechanism to get 
edits. There is a config {{dfs.ha.tail-edits.qjm.rpc.max-txns}} that sets an 
upper bound on how many txnid can be included in one RPC call. The default is 
5000. Meaning bootstraping NN (say NN1) can only pull at most 5000 edits from 
JN. However, as part of bootstrap, NN1 queries another NN (say NN2) for NN2's 
current transactionID, NN2 may return a state that is > 5000 txnid from NN1's 
current image. But NN1 can only see 5000 more txnid from JNs. At this point NN1 
goes panic, because txnid retuned by JNs is behind NN2's returned state, 
bootstrap then fail.

Increasing the value of {{dfs.ha.tail-edits.qjm.rpc.max-txns}} to some super 
large value allowed bootstrap to continue. But this is hardly the ideal 
solution.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Reply via email to