Xiaoqiao He created HDFS-14997: ---------------------------------- Summary: BPServiceActor process command from NameNode asynchronously Key: HDFS-14997 URL: https://issues.apache.org/jira/browse/HDFS-14997 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Reporter: Xiaoqiao He Assignee: Aiphago
There are two core functions, report(#sendHeartbeat, #blockReport, #cacheReport) and #processCommand in #BPServiceActor main process flow. If processCommand cost long time it will block send report flow. Meanwhile processCommand could cost long time(over 1000s the worst case I meet) when IO load of DataNode is very high. Since some IO operations are under #datasetLock, So it has to wait to acquire #datasetLock long time when process some of commands(such as #DNA_INVALIDATE). In such case, #heartbeat will not send to NameNode in-time, and trigger other disasters. I propose to improve #processCommand asynchronously and not block #BPServiceActor to send heartbeat back to NameNode when meet high IO load. Notes: 1. Lifeline could be one effective solution, however some old branches are not support this feature. 2. IO operations under #datasetLock is another issue, I think we should solve it at another JIRA. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org