ZanderXu created HDFS-17497:
-------------------------------

             Summary: Logic for committed blocks is mixed when computing file 
size
                 Key: HDFS-17497
                 URL: https://issues.apache.org/jira/browse/HDFS-17497
             Project: Hadoop HDFS
          Issue Type: Improvement
            Reporter: ZanderXu


One in-writing HDFS file may contains multiple committed blocks, as follows 
(assume one file contains three blocks):

 
|| ||Block 1||Block 2||Block 3||
|Case 1|Complete|Commit|UnderConstruction|
|Case 2|Complete|Commit|Commit|
|Case 3|Commit|Commit|Commit|

 

 

But the logic for committed blocks is mixed when computing file size, it 
ignores the bytes of the last committed block and contains the bytes of other 
committed blocks.

 
{code:java}
public final long computeFileSize(boolean includesLastUcBlock,
    boolean usePreferredBlockSize4LastUcBlock) {
  if (blocks.length == 0) {
    return 0;
  }
  final int last = blocks.length - 1;
  //check if the last block is BlockInfoUnderConstruction
  BlockInfo lastBlk = blocks[last];
  long size = lastBlk.getNumBytes();
  // the last committed block is not complete, so it's bytes may be ignored.
  if (!lastBlk.isComplete()) {
     if (!includesLastUcBlock) {
       size = 0;
     } else if (usePreferredBlockSize4LastUcBlock) {
       size = isStriped()?
           getPreferredBlockSize() *
               ((BlockInfoStriped)lastBlk).getDataBlockNum() :
           getPreferredBlockSize();
     }
  }
  // The bytes of other committed blocks are calculated into the file length.
  for (int i = 0; i < last; i++) {
    size += blocks[i].getNumBytes();
  }
  return size;
} {code}
 

 

The bytes of one committed block will not be changed, so the bytes of the last 
committed block should be calculated into the file length too.

 

And the logic for committed blocks is mixed too when computing file length in 
DFSInputStream. Normally DFSInputStream doesn't get visible length for 
committed block regardless of whether the committed block is the last block or 
not.

 

HDFS-10843 noticed one bug which actually caused by the committed block, but 
HDFS-10843 fixed that bug in another way.

The num of bytes of the committed block will no longer change, so we should 
update the quota usage when the block is committed, which can reduce the delta 
quota usage in time.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Reply via email to