Kihwal Lee created HDFS-11817: --------------------------------- Summary: A faulty node can cause a lease leak and NPE on accessing data Key: HDFS-11817 URL: https://issues.apache.org/jira/browse/HDFS-11817 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.8.0 Reporter: Kihwal Lee Priority: Critical
When the namenode performs a lease recovery for a failed write, the {{commitBlockSynchronization()}} will fail, if none of the new target has sent a received-IBR. At this point, the data is inaccessible, as the namenode will throw a {{NullPointerException}} upon {{getBlockLocations()}}. The lease recovery will be retried in about an hour by the namenode. If the nodes are faulty (usually when there is only one new target), they may not block report until this point. If this happens, lease recovery throws an {{AlreadyBeingCreatedException}}, which causes LeaseManager to simply remove the lease without finalizing the inode. This results in an inconsistent lease state. The inode stays under-construction, but no more lease recovery is attempted. A manual lease recovery is also not allowed. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org