Stephen O'Donnell created HDFS-14333:
----------------------------------------

             Summary: Datanode fails to start if any disk has errors during 
Namenode registration
                 Key: HDFS-14333
                 URL: https://issues.apache.org/jira/browse/HDFS-14333
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: datanode
    Affects Versions: 3.3.0
            Reporter: Stephen O'Donnell


This is closely related to HDFS-9908, where it was reported that a datanode 
would fail to start if an IO error occurred on a single disk when running du 
during Datanode registration. That Jira was closed due to HADOOP-12973 which 
refactored how du is called and prevents any exception being thrown. However 
this problem can still occur if the volume has errors (eg permission or 
filesystem corruption) when the disk is scanned to load all the replicas. The 
method chain is:

DataNode.initBlockPool -> FSDataSetImpl.addBlockPool -> 
FSVolumeList.getAllVolumesMap -> Throws exception which goes unhandled.

The DN logs will contain a stack trace for the problem volume, so the 
workaround is to remove the volume from the DN config and the DN will start, 
but the logs are a little confusing, so its always not obvious what the issue 
is.

These are the cut down logs from an occurrence of this issue.

{code}
2019-03-01 08:58:24,830 INFO 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Scanning 
block pool BP-240961797-x.x.x.x-1392827522027 on volume 
/data/18/dfs/dn/current...
...
2019-03-01 08:58:27,029 WARN org.apache.hadoop.fs.CachingGetSpaceUsed: Could 
not get disk usage information
ExitCodeException exitCode=1: du: cannot read directory 
`/data/18/dfs/dn/current/BP-240961797-x.x.x.x-1392827522027/current/finalized/subdir149/subdir215':
 Permission denied
du: cannot read directory 
`/data/18/dfs/dn/current/BP-240961797-x.x.x.x-1392827522027/current/finalized/subdir149/subdir213':
 Permission denied
du: cannot read directory 
`/data/18/dfs/dn/current/BP-240961797-x.x.x.x-1392827522027/current/finalized/subdir97/subdir25':
 Permission denied

        at org.apache.hadoop.util.Shell.runCommand(Shell.java:601)
        at org.apache.hadoop.util.Shell.run(Shell.java:504)
        at org.apache.hadoop.fs.DU$DUShell.startRefresh(DU.java:61)
        at org.apache.hadoop.fs.DU.refresh(DU.java:53)
        at 
org.apache.hadoop.fs.CachingGetSpaceUsed.init(CachingGetSpaceUsed.java:84)
        at 
org.apache.hadoop.fs.GetSpaceUsed$Builder.build(GetSpaceUsed.java:166)
        at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.<init>(BlockPoolSlice.java:145)
        at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.addBlockPool(FsVolumeImpl.java:881)
        at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList$2.run(FsVolumeList.java:412)
...
2019-03-01 08:58:27,043 INFO 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time taken 
to scan block pool BP-240961797-x.x.x.x-1392827522027 on 
/data/18/dfs/dn/current: 2202ms
{code}

So we can see a du error occurred, was logged but not re-thrown (due to 
HADOOP-12973) and the blockpool scan completed. However then in the 'add 
replicas to map' logic, we got another exception stemming from the same problem:

{code}
2019-03-01 08:58:27,564 INFO 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Adding 
replicas to map for block pool BP-240961797-x.x.x.x-1392827522027 on volume 
/data/18/dfs/dn/current...
...
2019-03-01 08:58:31,155 INFO 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Caught 
exception while adding replicas from /data/18/dfs/dn/current. Will throw later.
java.io.IOException: Invalid directory or I/O error occurred for dir: 
/data/18/dfs/dn/current/BP-240961797-x.x.x.x-1392827522027/current/finalized/subdir149/subdir215
        at org.apache.hadoop.fs.FileUtil.listFiles(FileUtil.java:1167)
        at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:445)
        at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:448)
        at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:448)
        at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.getVolumeMap(BlockPoolSlice.java:342)
        at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getVolumeMap(FsVolumeImpl.java:861)
        at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList$1.run(FsVolumeList.java:191)

< The message 2019-03-01 08:59:00,989 INFO 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to 
add replicas to map for block pool BP-240961797-x.x.x.x-1392827522027 on volume 
xxx did not appear for this volume as it failed >
{code}

The exception is re-thrown, so the DN fails registration and then retries. Then 
it finds all volumes already locked and exits with a 'all volumes failed' error.

I believe we should handle the failing volume like a runtime volume failure and 
only abort the DN if too many volumes have failed.

I will post a patch for this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Reply via email to