Luominghui created HDFS-16739: --------------------------------- Summary: EC: Reconstruction failed when file has specified StoragePolicy Key: HDFS-16739 URL: https://issues.apache.org/jira/browse/HDFS-16739 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.1.3 Reporter: Luominghui Fix For: 3.1.3
We found that due to BlockReconstructionWork use the same chooseTarge function with Redundancy Block, so the targe returned is more than real additionalReplRequired due to need to satisfy storage policy. So , it causes all kind of exception when DN do ECReconstructionWork. One of Exception in DN as follows: {code:java} 2022-08-24 03:01:39,534 WARN [Command processor] org.apache.hadoop.hdfs.server.datanode.DataNode: Failed to reconstruct striped block blk_-9223372032283192848_35319673088 java.lang.IllegalArgumentException: Too much missed striped blocks. at com.google.common.base.Preconditions.checkArgument(Preconditions.java:141) at org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedWriter.<init>(StripedWriter.java:87) at org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.<init>(StripedBlockReconstructor.java:45) at org.apache.hadoop.hdfs.server.datanode.erasurecode.ErasureCodingWorker.processErasureCodingTasks(ErasureCodingWorker.java:134) at org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActive(BPOfferService.java:797) at org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:680) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.processCommand(BPServiceActor.java:1306) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.lambda$enqueue$2(BPServiceActor.java:1344) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.processQueue(BPServiceActor.java:1280) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.run(BPServiceActor.java:1267) {code} this file ec policy is RS-6-3-1024, here is inner block info, blk_-9223372032283192845 (index:3) need to reconstruct , and all Storage is DISK ,but the file's storage policy is ALL_SSD {code:java} [blk_-9223372032283192848:DatanodeInfoWithStorage[10.x.x.33:50010,DS-e1435341-f43c-42ef-806f-90fsddfsfdcd,DISK], blk_-9223372032283192847:DatanodeInfoWithStorage[10.x.x.35:50010,DS-a6dsd16a-676a-4fed-8ffe-fsdfscw23445,DISK], blk_-9223372032283192846:DatanodeInfoWithStorage[10.x.x.34:50010,DS-40cdc124-e2e0-40f6-aa47-4d2bdsf3e8e5,DISK], blk_-9223372032283192844:DatanodeInfoWithStorage[10.x.x.21:50010,DS-ef9dee4f-dfb2-495c-872a-974dfscds58e,DISK], blk_-9223372032283192843:DatanodeInfoWithStorage[10.x.x.40:50010,DS-6dsedfa7-8291-46bb-964d-dfsf34567655,DISK], blk_-9223372032283192842:DatanodeInfoWithStorage[10.x.x.36:50010,DS-2dddc387-c38b-427d-9925-15a664d3472b,DISK], blk_-9223372032283192841:DatanodeInfoWithStorage[10.x.x.151:50010,DS-fds91a7-89ad-4899-bc44-675dfs32f58e,DISK], blk_-9223372032283192840:DatanodeInfoWithStorage[10.x.x.27:50010,DS-77dfs4c1-c23c-4b26-baa3-aadsfdff4118,DISK]] {code} here is BlockECReconstructionInfo, due to all inner block is not satisfied with storage policy(ALL_SSD) , so the targe length is 9 ,rather than 1. {code:java} 2022-08-24 03:01:39,534 INFO [Command processor] org.apache.hadoop.hdfs.server.datanode.DataNode: processErasureCodingTasks BlockECReconstructionInfo( Recovering BP-390041874-10.x.x.x-1550651014658:blk_-9223372032283192848_35319673088 From: [10.x.x.33:50010, 10.x.x.35:50010, 10.x.x.34:50010, 10.x.x.21:50010, 10.x.x.40:50010, 10.x.x.36:50010, 10.x.x.151:50010, 10.x.x.27:50010] To: [[10.x.x.37:50010, 10.x.x.21:50010, 10.x.x.32:50010, 10.x.x.27:50010, 10.x.x.28:50010, 10.x.x.23:50010, 10.x.x.23:50010, 10.x.x.101:50010, 10.x.x.32:50010]) Block Indices: [0, 1, 2, 4, 5, 6, 7, 8] {code} when init stripedWriter in DN StripedBlockReconstructor, need to judge targetIndicies.length<=prityBlkNum (9<=3) . so, this striped blocks will never reconstruct successfully. {code:java} targetIndices = new short[targets.length]; Preconditions.checkArgument(targetIndices.length <= parityBlkNum, "Too much missed striped blocks."); {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org