[ https://issues.apache.org/jira/browse/HDFS-1223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Konstantin Shvachko resolved HDFS-1223. --------------------------------------- Resolution: Duplicate This is fixed as Todd mentions. BTW sometimes the behavior you describe here is desirable, see HDFS-1158 and HDFS-1161. > DataNode fails stop due to a bad disk (or storage directory) > ------------------------------------------------------------ > > Key: HDFS-1223 > URL: https://issues.apache.org/jira/browse/HDFS-1223 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node > Affects Versions: 0.20.1 > Reporter: Thanh Do > > A datanode can store block files in multiple volumes. > If a datanode sees a bad volume during start up (i.e, face an exception > when accessing that volume), it simply fail stops, making all block files > stored in other healthy volumes inaccessible. Consequently, these lost > replicas will be generated later on in other datanodes. > If a datanode is able to mark the bad disk and continue working with > healthy ones, this will increase availability and avoid unnecessary > regeneration. As an extreme example, consider one datanode which has > 2 volumes V1 and V2, each contains about 10000 64MB block files. > During startup, the datanode gets an exception when accessing V1, it then > fail stops, making 20000 block files generated later on. > If the datanode masks V1 as bad and continues working with V2, the number > of replicas needed to be regenerated is cut in to half. > This bug was found by our Failure Testing Service framework: > http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html > For questions, please email us: Thanh Do (than...@cs.wisc.edu) and > Haryadi Gunawi (hary...@eecs.berkeley.edu) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.