zhouyingchao created HDFS-10989: ----------------------------------- Summary: Cannot get last block length after namenode failover Key: HDFS-10989 URL: https://issues.apache.org/jira/browse/HDFS-10989 Project: Hadoop HDFS Issue Type: Bug Reporter: zhouyingchao
On a 2.4 cluster, access to a file failed since the last block length cannot be gotten. The fsck output of the file at the moment of failure was like this: /user/XXXXXXXXX 483600487 bytes, 2 block(s), OPENFORWRITE: MISSING 1 blocks of total size 215165031 B 0. BP-219149063-10.108.84.25-1446859315800:blk_2102504098_1035525341 len=268435456 repl=3 [10.112.17.43:11402, 10.118.22.46:11402, 10.118.22.49:11402] 1. BP-219149063-10.108.84.25-1446859315800:blk_2103114087_1036219054{blockUCState=UNDER_RECOVERY, primaryNodeIndex=2, replicas=[ReplicaUnderConstruction[[DISK]DS-60be75ad-e4a7-4b1e-b3aa-327c85331d42:NORMAL|RBW], ReplicaUnderConstruction[[DISK]DS-184a1ce9-655a-4e67-b0cc-29ab9984bd0a:NORMAL|RBW], ReplicaUnderConstruction[[DISK]DS-6d037ac8-4bcc-4cdc-a803-55b1817e0200:NORMAL|RBW]]} len=215165031 MISSING! Recorded locations [10.114.10.14:11402, 10.118.29.3:11402, 10.118.22.42:11402] >From those three data nodes, we found that there were IOException related to >the block and there were pipeline recreating events. We figured out that there was a namenode failover event before the issue happened, and there were some updatePipeline calls to the earlier active namenode: 2016-09-27,15:04:36,437 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline(block=BP-219149063-10.108.84.25-1446859315800:blk_2103114087_1036137092, newGenerationStamp=1036170430, newLength=2624000, newNodes=[10.118.22.42:11402, 10.118.22.49:11402, 10.118.24.3:11402], clientName=DFSClient_NONMAPREDUCE_-442153643_1) 2016-09-27,15:04:36,438 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline(BP-219149063-10.108.84.25-1446859315800:blk_2103114087_1036137092) successfully to BP-219149063-10.108.84.25-1446859315800:blk_2103114087_1036170430 2016-09-27,15:10:10,596 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline(block=BP-219149063-10.108.84.25-1446859315800:blk_2103114087_1036170430, newGenerationStamp=1036219054, newLength=17138265, newNodes=[10.118.22.49:11402, 10.118.24.3:11402, 10.114.6.45:11402], clientName=DFSClient_NONMAPREDUCE_-442153643_1) 2016-09-27,15:10:10,601 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline(BP-219149063-10.108.84.25-1446859315800:blk_2103114087_1036170430) successfully to BP-219149063-10.108.84.25-1446859315800:blk_2103114087_1036219054 Whereas these new data nodes did not show up in the fsck output. It looks like that when data node recovers pipeline (PIPELINE_SETUP_STREAMING_RECOVERY ), the new data nodes would not call notifyNamingnodeReceivingBlock for the transfered block. >From code review, the issue also exists in more recent branch. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org