[ 
https://issues.apache.org/jira/browse/HDFS-5798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laurent Goujon resolved HDFS-5798.
----------------------------------

    Resolution: Duplicate

> DFSClient uses non-valid data when computing file checksum
> ----------------------------------------------------------
>
>                 Key: HDFS-5798
>                 URL: https://issues.apache.org/jira/browse/HDFS-5798
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs-client
>    Affects Versions: 1.1.2, 2.0.5-alpha
>            Reporter: Laurent Goujon
>
> In DFSClient.java, when computing the checksum, all md5 checksums are fetched 
> for each block and added to a DataOutputStream instance (md5out), and later 
> final checksum is computed this way:
> {code:title=DFSClient.java}
> final MD5Hash fileMD5 = MD5Hash.digest(md5out.getData());
> {code}
> The problem is that getData() return you a buffer valid until 
> md5out.getLength(), and fileMD5 is the MD5 of the MD5 of each block PLUS a 
> bunch of random values (here, buffer is not reused so it should be 0) which 
> depends on the Java implementation of the ByteArrayOutputStream.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to