zhouyingchao created HDFS-10989:
-----------------------------------
Summary: Cannot get last block length after namenode failover
Key: HDFS-10989
URL: https://issues.apache.org/jira/browse/HDFS-10989
Project: Hadoop HDFS
Issue Type: Bug
Reporter: zhouyingchao
On a 2.4 cluster, access to a file failed since the last block length cannot be
gotten. The fsck output of the file at the moment of failure was like this:
/user/XXXXXXXXX 483600487 bytes, 2 block(s), OPENFORWRITE: MISSING 1 blocks of
total size 215165031 B
0. BP-219149063-10.108.84.25-1446859315800:blk_2102504098_1035525341
len=268435456 repl=3 [10.112.17.43:11402, 10.118.22.46:11402,
10.118.22.49:11402]
1.
BP-219149063-10.108.84.25-1446859315800:blk_2103114087_1036219054{blockUCState=UNDER_RECOVERY,
primaryNodeIndex=2,
replicas=[ReplicaUnderConstruction[[DISK]DS-60be75ad-e4a7-4b1e-b3aa-327c85331d42:NORMAL|RBW],
ReplicaUnderConstruction[[DISK]DS-184a1ce9-655a-4e67-b0cc-29ab9984bd0a:NORMAL|RBW],
ReplicaUnderConstruction[[DISK]DS-6d037ac8-4bcc-4cdc-a803-55b1817e0200:NORMAL|RBW]]}
len=215165031 MISSING! Recorded locations [10.114.10.14:11402,
10.118.29.3:11402, 10.118.22.42:11402]
>From those three data nodes, we found that there were IOException related to
>the block and there were pipeline recreating events.
We figured out that there was a namenode failover event before the issue
happened, and there were some updatePipeline calls to the earlier active
namenode:
2016-09-27,15:04:36,437 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
updatePipeline(block=BP-219149063-10.108.84.25-1446859315800:blk_2103114087_1036137092,
newGenerationStamp=1036170430, newLength=2624000,
newNodes=[10.118.22.42:11402, 10.118.22.49:11402, 10.118.24.3:11402],
clientName=DFSClient_NONMAPREDUCE_-442153643_1)
2016-09-27,15:04:36,438 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
updatePipeline(BP-219149063-10.108.84.25-1446859315800:blk_2103114087_1036137092)
successfully to
BP-219149063-10.108.84.25-1446859315800:blk_2103114087_1036170430
2016-09-27,15:10:10,596 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
updatePipeline(block=BP-219149063-10.108.84.25-1446859315800:blk_2103114087_1036170430,
newGenerationStamp=1036219054, newLength=17138265,
newNodes=[10.118.22.49:11402, 10.118.24.3:11402, 10.114.6.45:11402],
clientName=DFSClient_NONMAPREDUCE_-442153643_1)
2016-09-27,15:10:10,601 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
updatePipeline(BP-219149063-10.108.84.25-1446859315800:blk_2103114087_1036170430)
successfully to
BP-219149063-10.108.84.25-1446859315800:blk_2103114087_1036219054
Whereas these new data nodes did not show up in the fsck output. It looks like
that when data node recovers pipeline (PIPELINE_SETUP_STREAMING_RECOVERY ), the
new data nodes would not call notifyNamingnodeReceivingBlock for the transfered
block.
>From code review, the issue also exists in more recent branch.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]