Daniel Ma created HDFS-16115: -------------------------------- Summary: Asynchronously handle BPServiceActor command mechanism may result in BPServiceActor never fails even CommandProcessingThread is closed with fatal error. Key: HDFS-16115 URL: https://issues.apache.org/jira/browse/HDFS-16115 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 3.3.1 Reporter: Daniel Ma Fix For: 3.3.1
It is an improvement issue. Actually the issue has two sub issues: 1- BPServerActor thread handle commands from NameNode in aysnchronous way ( CommandProcessThread handle commands ), so if there are any exception or errors happens in thread CommandProcessthread resulting the thread fails and stop, which is not aware of it and still keep put command from namenode into queues to be handled by CommandProcessThread 2-the second sub issue is based on the first one, if CommandProcessThread fails owing to some non-fatal error like "can not create native thread" which is caused by too many threads existed on the node, this kind of problem should be given much torlerance instead of simply shudown the thread and never recover automatically, because the non-fatal eror mention above may recover soon by itself, currently, Datanode BPServiceActor cannot turn to normal even when the non-fatal error was eliminated. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org