[ https://issues.apache.org/jira/browse/HDFS-9955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Colin Patrick McCabe resolved HDFS-9955. ---------------------------------------- Resolution: Duplicate > DataNode won't self-heal after some block dirs were manually misplaced > ---------------------------------------------------------------------- > > Key: HDFS-9955 > URL: https://issues.apache.org/jira/browse/HDFS-9955 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode > Affects Versions: 2.6.0 > Environment: CentOS 6, Cloudera 5.4.4 (patched Hadoop 2.6.0) > Reporter: David Watzke > Labels: data-integrity > > I have accidentally ran this tool on top of DataNode's datadirs (of a > datanode that was shut down at the moment): > https://github.com/killerwhile/volume-balancer > The tool makes assumptions about block directory placement that are no longer > valid in hadoop 2.6.0 and it was just moving them around between different > datadirs to make the disk usage balanced. OK, it was not a good idea to run > it but my concern is the way the datanode was (not) handling the resulting > state. I've seen these messages in DN log (see below) which means DN knew > about this but didn't do anything to fix it (self-heal by copying the other > replica) - which seems like a bug to me. If you need any additional info > please just ask. > {noformat} > 2016-03-04 12:40:06,008 WARN > org.apache.hadoop.hdfs.server.datanode.VolumeScanner: I/O error while finding > block BP-680964103-A.B.C.D-1375882473930:blk_-3159875140074863904_0 on volume > /data/18/cdfs/dn > 2016-03-04 12:40:06,009 WARN > org.apache.hadoop.hdfs.server.datanode.VolumeScanner: I/O error while finding > block BP-680964103-A.B.C.D-1375882473930:blk_8369468090548520777_0 on volume > /data/18/cdfs/dn > 2016-03-04 12:40:06,011 WARN > org.apache.hadoop.hdfs.server.datanode.VolumeScanner: I/O error while finding > block BP-680964103-A.B.C.D-1375882473930:blk_1226431637_0 on volume > /data/18/cdfs/dn > 2016-03-04 12:40:06,012 WARN > org.apache.hadoop.hdfs.server.datanode.VolumeScanner: I/O error while finding > block BP-680964103-A.B.C.D-1375882473930:blk_1169332185_0 on volume > /data/18/cdfs/dn > 2016-03-04 12:40:06,825 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > opReadBlock BP-680964103-A.B.C.D-1375882473930:blk_1226781281_1099829669050 > received exception java.io.IOException: BlockId 1226781281 is not valid. > 2016-03-04 12:40:06,825 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > DatanodeRegistration(X.Y.Z.30, > datanodeUuid=9da950ca-87ae-44ee-9391-0bca669c796b, infoPort=50075, > ipcPort=50020, > storageInfo=lv=-56;cid=cluster12;nsid=1625487778;c=1438754073236):Got > exception while serving > BP-680964103-A.B.C.D-1375882473930:blk_1226781281_1099829669050 to > /X.Y.Z.30:48146 > java.io.IOException: BlockId 1226781281 is not valid. > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockFile(FsDatasetImpl.java:650) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockFile(FsDatasetImpl.java:641) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getMetaDataInputStream(FsDatasetImpl.java:214) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:282) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:529) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:116) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:243) > at java.lang.Thread.run(Thread.java:745) > 2016-03-04 12:40:06,826 ERROR > org.apache.hadoop.hdfs.server.datanode.DataNode: > prg04-002.xyz.tld:50010:DataXceiver error processing READ_BLOCK operation > src: /X.Y.Z.30:48146 dst: /X.Y.Z.30:50010 > java.io.IOException: BlockId 1226781281 is not valid. > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockFile(FsDatasetImpl.java:650) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockFile(FsDatasetImpl.java:641) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getMetaDataInputStream(FsDatasetImpl.java:214) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:282) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:529) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:116) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:243) > at java.lang.Thread.run(Thread.java:745) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)