[ https://issues.apache.org/jira/browse/HDFS-5798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Laurent Goujon resolved HDFS-5798. ---------------------------------- Resolution: Duplicate > DFSClient uses non-valid data when computing file checksum > ---------------------------------------------------------- > > Key: HDFS-5798 > URL: https://issues.apache.org/jira/browse/HDFS-5798 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client > Affects Versions: 1.1.2, 2.0.5-alpha > Reporter: Laurent Goujon > > In DFSClient.java, when computing the checksum, all md5 checksums are fetched > for each block and added to a DataOutputStream instance (md5out), and later > final checksum is computed this way: > {code:title=DFSClient.java} > final MD5Hash fileMD5 = MD5Hash.digest(md5out.getData()); > {code} > The problem is that getData() return you a buffer valid until > md5out.getLength(), and fileMD5 is the MD5 of the MD5 of each block PLUS a > bunch of random values (here, buffer is not reused so it should be 0) which > depends on the Java implementation of the ByteArrayOutputStream. -- This message was sent by Atlassian JIRA (v6.1.5#6160)