[jira] Resolved: (HDFS-1223) DataNode fails stop due to a bad disk (or storage directory)

Konstantin Shvachko (JIRA) Tue, 22 Jun 2010 19:29:19 -0700

     [ 
https://issues.apache.org/jira/browse/HDFS-1223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Konstantin Shvachko resolved HDFS-1223.
---------------------------------------

    Resolution: Duplicate

This is fixed as Todd mentions. BTW sometimes the behavior you describe here is 
desirable, see HDFS-1158 and HDFS-1161.

> DataNode fails stop due to a bad disk (or storage directory)
> ------------------------------------------------------------
>
>                 Key: HDFS-1223
>                 URL: https://issues.apache.org/jira/browse/HDFS-1223
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: data-node
>    Affects Versions: 0.20.1
>            Reporter: Thanh Do
>
> A datanode can store block files in multiple volumes.
> If a datanode sees a bad volume during start up (i.e, face an exception
> when accessing that volume), it simply fail stops, making all block files
> stored in other healthy volumes inaccessible. Consequently, these lost
> replicas will be generated later on in other datanodes. 
> If a datanode is able to mark the bad disk and continue working with
> healthy ones, this will increase availability and avoid unnecessary 
> regeneration. As an extreme example, consider one datanode which has
> 2 volumes V1 and V2, each contains about 10000 64MB block files.
> During startup, the datanode gets an exception when accessing V1, it then 
> fail stops, making 20000 block files generated later on.
> If the datanode masks V1 as bad and continues working with V2, the number
> of replicas needed to be regenerated is cut in to half.
> This bug was found by our Failure Testing Service framework:
> http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html
> For questions, please email us: Thanh Do (than...@cs.wisc.edu) and 
> Haryadi Gunawi (hary...@eecs.berkeley.edu)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (HDFS-1223) DataNode fails stop due to a bad disk (or storage directory)

Reply via email to