Li Junjun created HDFS-4318: ------------------------------- Summary: validateBlockMetadata reduce the success rate of block recover Key: HDFS-4318 URL: https://issues.apache.org/jira/browse/HDFS-4318 Project: Hadoop HDFS Issue Type: Wish Components: datanode Affects Versions: 1.0.1 Reporter: Li Junjun Priority: Minor
some logs like "java.io.IOException: Block blk_3272028001529756059_11883841 length is 20480 does not match block file length 21376 " when recovery block when datanode perform startBlockRecovery it call validateBlockMetadata( in FSDataset.startBlockRecovery ), check the file lenth match the block's numBytes so , let us see how block's numBytes was updated in datanode when write block in BlockReceiver.receivePacket , write->flush->setVisibleLength, that means it is normal and reasonable that the file length > the block's numBytes if write or flush throw exception . In startBlockRecovery( or other situation,to be check) we just need to guarantee the file length < the block's numBytes never happens . I suggest change the validateBlockMetadata , because it reduced the success rate of block recover. when you have a pipline , a->b->c when a got error in network , b got error in write->flush , we can only count on c! -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira