star created HDFS-15299: --------------------------- Summary: Add an option to enable reporting&removing flaky disk Key: HDFS-15299 URL: https://issues.apache.org/jira/browse/HDFS-15299 Project: Hadoop HDFS Issue Type: Bug Reporter: star Assignee: star
In our production environment with disks more than 8 years old, many DN are treated as dead because of partially broken. Then NN will balance data blocks in the cluster, introducing high disk loads. To reduce the impact of flaky disks, we'd like to extend the tolerance mechanism to partial disk failure. As described in HDFS-10777 , command du could still throw exception in a high loaded disk. It is brittle to just remove a flaky disk because it may recover later. However it is a rare case in our production environment. So can we just add an option to enable partial disk failure tolerance for users who has mostly broken disks and care more about stability of the cluster. We will replace those old disks in the future, but before that, it will last a long time to run hdfs cluster on those servers. Comments are appreciated. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org