This may have been discussed in the past, but I haven't been able to find one...
It seems as though much work has been done to make distcp from 1.0 to 2.0 work with checksum enabled ( https://issues.apache.org/jira/browse/HADOOP-8060). And I do see all the work has been merged to the 2.0 releases. However, it seems that distcp from 1.0 to 2.0 still doesn't work if the CRC check is enabled. Is that a correct understanding? I took a quick look at the distcp code (mostly around CopyMapper and RetriableFileCopyCommand), and I don't see how the source checksum type is passed into creating the file with DFSClient. And also it doesn't look like dfs.checksum.type is being set upon discovering the source checksum type (which would have been another mechanism). And this is consistent with my testing. And I can also confirm that it works if I pass in command line option "-Ddfs.checksum.type=CRC32". Is this understanding accurate? If so, is there a reason this was not done in distcp? Curious... Thanks, Sangjin