ZanderXu created HDFS-17497: ------------------------------- Summary: Logic for committed blocks is mixed when computing file size Key: HDFS-17497 URL: https://issues.apache.org/jira/browse/HDFS-17497 Project: Hadoop HDFS Issue Type: Improvement Reporter: ZanderXu
One in-writing HDFS file may contains multiple committed blocks, as follows (assume one file contains three blocks): || ||Block 1||Block 2||Block 3|| |Case 1|Complete|Commit|UnderConstruction| |Case 2|Complete|Commit|Commit| |Case 3|Commit|Commit|Commit| But the logic for committed blocks is mixed when computing file size, it ignores the bytes of the last committed block and contains the bytes of other committed blocks. {code:java} public final long computeFileSize(boolean includesLastUcBlock, boolean usePreferredBlockSize4LastUcBlock) { if (blocks.length == 0) { return 0; } final int last = blocks.length - 1; //check if the last block is BlockInfoUnderConstruction BlockInfo lastBlk = blocks[last]; long size = lastBlk.getNumBytes(); // the last committed block is not complete, so it's bytes may be ignored. if (!lastBlk.isComplete()) { if (!includesLastUcBlock) { size = 0; } else if (usePreferredBlockSize4LastUcBlock) { size = isStriped()? getPreferredBlockSize() * ((BlockInfoStriped)lastBlk).getDataBlockNum() : getPreferredBlockSize(); } } // The bytes of other committed blocks are calculated into the file length. for (int i = 0; i < last; i++) { size += blocks[i].getNumBytes(); } return size; } {code} The bytes of one committed block will not be changed, so the bytes of the last committed block should be calculated into the file length too. And the logic for committed blocks is mixed too when computing file length in DFSInputStream. Normally DFSInputStream doesn't get visible length for committed block regardless of whether the committed block is the last block or not. HDFS-10843 noticed one bug which actually caused by the committed block, but HDFS-10843 fixed that bug in another way. The num of bytes of the committed block will no longer change, so we should update the quota usage when the block is committed, which can reduce the delta quota usage in time. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org