Rushabh S Shah created HDFS-8224:
------------------------------------

             Summary: Any IOException in DataTransfer#run() will run diskError 
thread even if it is not disk error
                 Key: HDFS-8224
                 URL: https://issues.apache.org/jira/browse/HDFS-8224
             Project: Hadoop HDFS
          Issue Type: Bug
    Affects Versions: 2.6.0
            Reporter: Rushabh S Shah
             Fix For: 2.8.0


This happened in our 2.6 cluster.
One of the block and its metadata file were corrupted.
Namenode tried to copy that block to another datanode but failed with the 
following stack trace:

2015-04-20 01:04:04,421 
[org.apache.hadoop.hdfs.server.datanode.DataNode$DataTransfer@11319bc4] WARN 
datanode.DataNode: DatanodeRegistration(a.b.c.d, 
datanodeUuid=e8c5135c-9b9f-4d05-a59d-e5525518aca7, infoPort=1006, 
infoSecurePort=0, ipcPort=8020, 
storageInfo=lv=-56;cid=CID-e7f736ac-158e-446e-9091-7e66f3cddf3c;nsid=358250775;c=1428471998571):Failed
 to transfer BP-xxx-1351096255769:blk_2697560713_1107108863999 to 
a1.b1.c1.d1:1004 got 
java.io.IOException: Could not create DataChecksum of type 0 with 
bytesPerChecksum 0
        at 
org.apache.hadoop.util.DataChecksum.newDataChecksum(DataChecksum.java:125)
        at 
org.apache.hadoop.hdfs.server.datanode.BlockMetadataHeader.readHeader(BlockMetadataHeader.java:175)
        at 
org.apache.hadoop.hdfs.server.datanode.BlockMetadataHeader.readHeader(BlockMetadataHeader.java:140)
        at 
org.apache.hadoop.hdfs.server.datanode.BlockMetadataHeader.readDataChecksum(BlockMetadataHeader.java:102)
        at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:287)
        at 
org.apache.hadoop.hdfs.server.datanode.DataNode$DataTransfer.run(DataNode.java:1989)
        at java.lang.Thread.run(Thread.java:722)

The catch block in DataTransfer#run method will treat every IOException as disk 
error fault and run disk errror

catch (IOException ie) {
        LOG.warn(bpReg + ":Failed to transfer " + b + " to " +
            targets[0] + " got ", ie);
        // check if there are any disk problem
        checkDiskErrorAsync();
      } 

This block was never scanned by BlockPoolSliceScanner otherwise it would have 
reported as corrupt block.











--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to