Wei-Chiu Chuang created HDFS-9923:
-------------------------------------
Summary: Datanode disk failure handling should be improved
(consistently?)
Key: HDFS-9923
URL: https://issues.apache.org/jira/browse/HDFS-9923
Project: Hadoop HDFS
Issue Type: Improvement
Components: datanode
Reporter: Wei-Chiu Chuang
Disk failures are hard to handle. This JIRA is created to discuss/improve disk
failure handling in a better/consistent manner.
For one thing, disks can fail in multiple different ways: the hardware might be
failing, disk space is full, checksum error ... For others, hardware abstracts
out the details, so it's hard for software to handle them.
There are currently three disk check mechanisms in HDFS, as far as I know:
{{BlockScanner}}, {{BlockPoolSlice#checkDirs}} and {{DU}}. Disk errors are
handled differently.
This JIRA is more focused on {{DU}} error handling. {{DU}} may emit errors like
this:
{noformat}
2016-02-18 02:23:36,224 INFO
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Caught
exception while scanning /data/8/dfs/dn/current.
Will throw later.
ExitCodeException exitCode=1: du: cannot access
`/data/8/dfs/dn/current/BP-1018136951-49.4.167.110-1403564146510/current/finalized/subdir228/subdir11/blk_
1088686909': Input/output error
du: cannot access
`/data/8/dfs/dn/current/BP-1018136951-49.4.167.110-1403564146510/current/finalized/subdir228/subdir11/blk_1088686909_14954023.meta':
Inp
ut/output error
{noformat}
I found {{DU}} errors are not handled consistently while working on HDFS-9908
(Datanode should tolerate disk scan failure during NN handshake), and it all
depends on who catches the exception.
For example,
* if DU returns error during NN handshake, DN will not be able to join the
cluster at all(HDFS-9908);
* however, if the same exception is caught in {{BlockPoolSlice#saveDfsUsed}},
data node will only log a warning and do nothing (HDFS-5498).
* in some cases, the exception handler invokes {{BlockPoolSlice#checkDirs}},
but since it only checks three directories, it is very unlikely to find the
files that have the error. {{BlockReceiver#(constructor)}}
So my ask is: should the error be handled in a consistent manner? Should data
node report to the name nodes about the disk failures (this is the BlockScanner
approach), and should data node takes this volume offline automatically if DU
returns an error? (this is the {{checkDirs}} approach)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)