Haiyang Hu created HDFS-17250: --------------------------------- Summary: EditLogTailer#triggerActiveLogRoll should handle thread Interrupted Key: HDFS-17250 URL: https://issues.apache.org/jira/browse/HDFS-17250 Project: Hadoop HDFS Issue Type: Improvement Reporter: Haiyang Hu Assignee: Haiyang Hu
*Issue:* When the NameNode attempts to trigger a log roll and the cachedActiveProxy is a "shut down NameNode," it is unable to establish a network connection. This results in a timeout during the socket connection phase, which has a set timeout of 90 seconds. Since the asynchronous call for "Triggering log roll" has a waiting time of 60 seconds, it triggers a timeout and initiates a "cancel" operation, causing the executing thread to receive an "Interrupted" signal and throwing a "java.io.InterruptedIOException" exception. Currently, the logic not to handle interrupted signal, and the "getActiveNodeProxy" method hasn't reached the maximum retry limit, the overall execution process doesn't exit and it continues to attempt to call the "rollEditLog" on the next NameNode in the list. However when a socket connection is established, it throws a "java.nio.channels.ClosedByInterruptException" exception due to the thread being in an "Interrupted" state. this cycle repeats until it reaches the maximum retry limit (nnCount * maxRetries) will exits. However in the next cycle of "Triggering log roll," it continues to traverse the NameNode list and encounters the same issue and the cachedActiveProxy is still a "shut down NameNode." This eventually results in the NameNode being unable to successfully complete the "Triggering log roll" operation. To optimize this, we need to handle the thread being interrupted and exit the execution -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org