Stephen O'Donnell created HDFS-14706:
----------------------------------------

             Summary: Checksums are not checked if block meta file is less than 
7 bytes
                 Key: HDFS-14706
                 URL: https://issues.apache.org/jira/browse/HDFS-14706
             Project: Hadoop HDFS
          Issue Type: Bug
    Affects Versions: 3.3.0
            Reporter: Stephen O'Donnell
            Assignee: Stephen O'Donnell


If a block and its meta file are corrupted in a certain way, the corruption can 
go unnoticed by a client, causing it to return invalid data.

The meta file is expected to always have a header of 7 bytes and then a series 
of checksums depending on the length of the block.

If the metafile gets corrupted in such a way, that it is between zero and less 
than 7 bytes in length, then the header is incomplete. In BlockSender.java the 
logic checks if the metafile length is at least the size of the header and if 
it is not, it does not error, but instead returns a NULL checksum type to the 
client.

https://github.com/apache/hadoop/blob/b77761b0e37703beb2c033029e4c0d5ad1dce794/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockSender.java#L327-L357


If the client receives a NULL checksum client, it will not validate checksums 
at all, and even corrupted data will be returned to the reader. This means this 
corrupt will go unnoticed and HDFS will never repair it. Even the Volume 
Scanner will not notice the corruption as the checksums are silently ignored.

Additionally, if the meta file does have enough bytes so it attempts to load 
the header, and the header is corrupted such that it is not valid, it can cause 
the datanode Volume Scanner to exit, which an exception like the following:

{code}
2019-08-06 18:16:39,151 ERROR datanode.VolumeScanner: 
VolumeScanner(/tmp/hadoop-sodonnell/dfs/data, 
DS-7f103313-61ba-4d37-b63d-e8cf7d2ed5f7) exiting because of exception 
java.lang.IllegalArgumentException: id=51 out of range [0, 5)
        at 
org.apache.hadoop.util.DataChecksum$Type.valueOf(DataChecksum.java:76)
        at 
org.apache.hadoop.util.DataChecksum.newDataChecksum(DataChecksum.java:167)
        at 
org.apache.hadoop.hdfs.server.datanode.BlockMetadataHeader.readHeader(BlockMetadataHeader.java:173)
        at 
org.apache.hadoop.hdfs.server.datanode.BlockMetadataHeader.readHeader(BlockMetadataHeader.java:139)
        at 
org.apache.hadoop.hdfs.server.datanode.BlockMetadataHeader.readHeader(BlockMetadataHeader.java:153)
        at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.loadLastPartialChunkChecksum(FsVolumeImpl.java:1140)
        at 
org.apache.hadoop.hdfs.server.datanode.FinalizedReplica.loadLastPartialChunkChecksum(FinalizedReplica.java:157)
        at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.getPartialChunkChecksumForFinalized(BlockSender.java:451)
        at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:266)
        at 
org.apache.hadoop.hdfs.server.datanode.VolumeScanner.scanBlock(VolumeScanner.java:446)
        at 
org.apache.hadoop.hdfs.server.datanode.VolumeScanner.runLoop(VolumeScanner.java:558)
        at 
org.apache.hadoop.hdfs.server.datanode.VolumeScanner.run(VolumeScanner.java:633)
2019-08-06 18:16:39,152 INFO datanode.VolumeScanner: 
VolumeScanner(/tmp/hadoop-sodonnell/dfs/data, 
DS-7f103313-61ba-4d37-b63d-e8cf7d2ed5f7) exiting.
{code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Reply via email to