[ https://issues.apache.org/jira/browse/HDFS-15237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Akira Ajisaka resolved HDFS-15237. ---------------------------------- Fix Version/s: (was: 3.2.2) Assignee: (was: zhengchenyu) Resolution: Duplicate The fix seems the same as HDFS-15643. Closing this as duplicate. Thanks [~zhengchenyu]. > Get checksum of EC file failed, when some block is missing or corrupt > --------------------------------------------------------------------- > > Key: HDFS-15237 > URL: https://issues.apache.org/jira/browse/HDFS-15237 > Project: Hadoop HDFS > Issue Type: Bug > Components: ec, hdfs > Affects Versions: 3.2.1 > Reporter: zhengchenyu > Priority: Blocker > Labels: pull-request-available > Attachments: HDFS-15237.001.patch > > Time Spent: 50m > Remaining Estimate: 0h > > When we distcp from an ec directory to another one, I found some error like > this. > {code} > 2020-03-20 20:18:21,366 WARN [main] > org.apache.hadoop.hdfs.FileChecksumHelper: src=/EC/6-3/****/000325_0, > datanodes[6]=DatanodeInfoWithStorage[10.200.128.40:9866,DS-65ac4407-9d33-4c59-8f72-dd1d80d26d9f,DISK]2020-03-20 > 20:18:21,366 WARN [main] org.apache.hadoop.hdfs.FileChecksumHelper: > src=/EC/6-3/****/000325_0, > datanodes[6]=DatanodeInfoWithStorage[10.200.128.40:9866,DS-65ac4407-9d33-4c59-8f72-dd1d80d26d9f,DISK]java.io.EOFException: > Unexpected EOF while trying to read response from server at > org.apache.hadoop.hdfs.protocolPB.PBHelperClient.vintPrefixed(PBHelperClient.java:550) > at > org.apache.hadoop.hdfs.FileChecksumHelper$StripedFileNonStripedChecksumComputer.tryDatanode(FileChecksumHelper.java:709) > at > org.apache.hadoop.hdfs.FileChecksumHelper$StripedFileNonStripedChecksumComputer.checksumBlockGroup(FileChecksumHelper.java:664) > at > org.apache.hadoop.hdfs.FileChecksumHelper$StripedFileNonStripedChecksumComputer.checksumBlocks(FileChecksumHelper.java:638) > at > org.apache.hadoop.hdfs.FileChecksumHelper$FileChecksumComputer.compute(FileChecksumHelper.java:252) > at > org.apache.hadoop.hdfs.DFSClient.getFileChecksumInternal(DFSClient.java:1790) > at > org.apache.hadoop.hdfs.DFSClient.getFileChecksumWithCombineMode(DFSClient.java:1810) > at > org.apache.hadoop.hdfs.DistributedFileSystem$33.doCall(DistributedFileSystem.java:1691) > at > org.apache.hadoop.hdfs.DistributedFileSystem$33.doCall(DistributedFileSystem.java:1688) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileChecksum(DistributedFileSystem.java:1700) > at > org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doCopy(RetriableFileCopyCommand.java:138) > at > org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doExecute(RetriableFileCopyCommand.java:115) > at > org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:87) > at > org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:259) > at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:220) at > org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:48) at > org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146) at > org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:799) at > org.apache.hadoop.mapred.MapTask.run(MapTask.java:347) at > org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174) at > java.security.AccessController.doPrivileged(Native Method) at > javax.security.auth.Subject.doAs(Subject.java:422) at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168) > {code} > And Then I found some error in datanode like this > {code} > 2020-03-20 20:54:16,573 INFO > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient: > SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = > false > 2020-03-20 20:54:16,577 ERROR > org.apache.hadoop.hdfs.server.datanode.DataNode: > bd-hadoop-128050.zeus.lianjia.com:9866:DataXceiver error processing > BLOCK_GROUP_CHECKSUM operation src: /10.201.1.38:33264 dst: > /10.200.128.50:9866 > java.lang.UnsupportedOperationException > at java.nio.ByteBuffer.array(ByteBuffer.java:994) > at > org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockChecksumReconstructor.reconstruct(StripedBlockChecksumReconstructor.java:90) > at > org.apache.hadoop.hdfs.server.datanode.BlockChecksumHelper$BlockGroupNonStripedChecksumComputer.recalculateChecksum(BlockChecksumHelper.java:711) > at > org.apache.hadoop.hdfs.server.datanode.BlockChecksumHelper$BlockGroupNonStripedChecksumComputer.compute(BlockChecksumHelper.java:489) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.blockGroupChecksum(DataXceiver.java:1047) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opStripedBlockChecksum(Receiver.java:327) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:119) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:292) > at java.lang.Thread.run(Thread.java:748) > {code} > The reason is that: When some block is missing or corrupt, datanode will > trigger to call recalculateChecksum. But if > StripedBlockChecksumReconstructor.targetBuffer is DirectByteBuffer, we > couldn't use DirectByteBuffer.array(), so throw the exception. Then we > could't get checksum. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org