Hui Fei created HDFS-15875: ------------------------------ Summary: Check whether file is being truncated before truncate Key: HDFS-15875 URL: https://issues.apache.org/jira/browse/HDFS-15875 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.2.2, 3.1.4, 3.3.0 Reporter: Hui Fei Assignee: Hui Fei
We have got this problem. * A job sends truncate to namenode, and the block recovery goes. * DataNode D is timeout while it connects another datanode (60s), so block recovery costs 60+s * A job tails, and B job starts and it sends truncate to namenode. New recoveryId generates during recovery lease. * DataNode D commitBlockSynchronization and get errors "does not match current recovery id" So truncate will not complete forever. Datanode D has replica with new length and two other datanodes have replica old length. DN has the error messages "Inconsistent size of finalized replicas" the related code is in BlockRecoveryWorker.java {code} for (BlockRecord r : syncList) { assert r.rInfo.getNumBytes() > 0 : "zero length replica"; ReplicaState rState = r.rInfo.getOriginalReplicaState(); if (rState.getValue() < bestState.getValue()) { bestState = rState; } if(rState == ReplicaState.FINALIZED) { if (finalizedLength > 0 && finalizedLength != r.rInfo.getNumBytes()) { throw new IOException("Inconsistent size of finalized replicas. " + "Replica " + r.rInfo + " expected size: " + finalizedLength); } finalizedLength = r.rInfo.getNumBytes(); } } {code} {code:java} {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org