Elliot West created HDFS-10338:
----------------------------------

             Summary: DistCp masks potential CRC check failures
                 Key: HDFS-10338
                 URL: https://issues.apache.org/jira/browse/HDFS-10338
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: distcp
    Affects Versions: 2.7.1
            Reporter: Elliot West


There appear to be edge cases whereby CRC checks may be circumvented when 
requests for checksums from the source or target file system fail. In this 
event CRCs could differ between the source and target and yet the DistCp copy 
would succeed, even when the 'skip CRC check' option is not being used.

The code in question is contained in the method 
[{{org.apache.hadoop.tools.util.DistCpUtils#checksumsAreEqual(...)}}|https://github.com/apache/hadoop/blob/release-2.7.1/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/DistCpUtils.java#L457]

Specifically this code block suggests that if there is a failure when trying to 
read the source or target checksum then the method will return {{true}} (i.e.  
the checksums are equal), implying that the check succeeded. In actual fact we 
just failed to obtain the checksum and could not perform the check.
{code}
try {
  sourceChecksum = sourceChecksum != null ? sourceChecksum : 
    sourceFS.getFileChecksum(source);
  targetChecksum = targetFS.getFileChecksum(target);
} catch (IOException e) {
  LOG.error("Unable to retrieve checksum for " + source + " or "
    + target, e);
}
return (sourceChecksum == null || targetChecksum == null ||
  sourceChecksum.equals(targetChecksum));
{code}

I believe that at the very least the caught {{IOException}} should be 
re-thrown. If this is not deemed desirable then I believe an option 
({{--strictCrc}}?) should be added to enforce a strict check where we require 
that both the source and target CRCs are retrieved, are not null, and are then 
compared for equality. If for any reason either of the CRCs retrievals fail 
then an exception is thrown.

Clearly some {{FileSystems}} do not support CRCs and invocations to 
{{FileSystem.getFileChecksum(...)}} return {{null}} in these instances. I would 
suggest that these should fail a strict CRC check to prevent users developing a 
false sense of security in their copy pipeline.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to