Inigo Goiri created HDFS-9910: --------------------------------- Summary: Datanode heartbeats can get blocked by disk in {{FsDatasetImpl#checkBlock()}} Key: HDFS-9910 URL: https://issues.apache.org/jira/browse/HDFS-9910 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.7.2 Reporter: Inigo Goiri Assignee: Hua Liu
When a data node needs to transfer a block, it validates the block in the heartbeat thread invoking the {{checkBlock()}} method of {{FsDatasetImpl}}, where it checks whether the block exists and gets the block length. If the block is valid, it then spins off a thread to do the actual block transfer. We found that during heavy disk IO the heartbeat thread hangs on {{replicaInfo.getBlockFile().exists()}} for more than 10 minutes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)