Roman Shaposhnik created HDFS-5225:
--------------------------------------

             Summary: datanode keeps logging the same 'is no longer in the 
dataset' message over and over agin
                 Key: HDFS-5225
                 URL: https://issues.apache.org/jira/browse/HDFS-5225
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: datanode
    Affects Versions: 2.1.1-beta
            Reporter: Roman Shaposhnik


I was running the usual Bigtop testing on 2.1.1-beta RC1 with the following 
configuration: 4 nodes fully distributed cluster with security on.

All of a sudden my DN ate up all of the space in /var/log logging the following 
message repeatedly:
{noformat}
2013-09-18 20:51:12,046 INFO 
org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: 
BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369 is no longer in 
the dataset
{noformat}

It wouldn't answer to a jstack and jstack -F ended up being useless.

Here's what I was able to find in the NameNode logs regarding this block ID:

{noformat}
fgrep -rI 'blk_1073742189' hadoop-hdfs-namenode-ip-10-224-158-152.log
2013-09-18 18:03:16,972 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
allocateBlock: 
/user/jenkins/testAppendInputWedSep18180222UTC2013/test4.fileWedSep18180222UTC2013._COPYING_.
 BP-1884637155-10.224.158.152-1379524544853 
blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], 
ReplicaUnderConstruction[10.34.74.206:1004|RBW], 
ReplicaUnderConstruction[10.224.158.152:1004|RBW]]}
2013-09-18 18:03:17,222 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap 
updated: 10.224.158.152:1004 is added to 
blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], 
ReplicaUnderConstruction[10.34.74.206:1004|RBW], 
ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0
2013-09-18 18:03:17,222 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap 
updated: 10.34.74.206:1004 is added to 
blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], 
ReplicaUnderConstruction[10.34.74.206:1004|RBW], 
ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0
2013-09-18 18:03:17,224 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap 
updated: 10.83.107.80:1004 is added to 
blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW], 
ReplicaUnderConstruction[10.34.74.206:1004|RBW], 
ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0
2013-09-18 18:03:17,899 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
updatePipeline(block=BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369,
 newGenerationStamp=1370, newLength=1048576, newNodes=[10.83.107.80:1004, 
10.34.74.206:1004, 10.224.158.152:1004], 
clientName=DFSClient_NONMAPREDUCE_-450304083_1)
2013-09-18 18:03:17,904 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
updatePipeline(BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369) 
successfully to BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370
2013-09-18 18:03:18,540 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
updatePipeline(block=BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370,
 newGenerationStamp=1371, newLength=2097152, newNodes=[10.83.107.80:1004, 
10.34.74.206:1004, 10.224.158.152:1004], 
clientName=DFSClient_NONMAPREDUCE_-450304083_1)
2013-09-18 18:03:18,548 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
updatePipeline(BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370) 
successfully to BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1371
2013-09-18 18:03:26,150 INFO BlockStateChange: BLOCK* addToInvalidates: 
blk_1073742189_1371 10.83.107.80:1004 10.34.74.206:1004 10.224.158.152:1004 
2013-09-18 18:03:26,847 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
InvalidateBlocks: ask 10.34.74.206:1004 to delete [blk_1073742178_1359, 
blk_1073742183_1362, blk_1073742184_1363, blk_1073742186_1366, 
blk_1073742188_1368, blk_1073742189_1371]
2013-09-18 18:03:29,848 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
InvalidateBlocks: ask 10.224.158.152:1004 to delete [blk_1073742177_1353, 
blk_1073742178_1359, blk_1073742179_1355, blk_1073742180_1356, 
blk_1073742181_1358, blk_1073742182_1361, blk_1073742185_1364, 
blk_1073742187_1367, blk_1073742188_1368, blk_1073742189_1371]
2013-09-18 18:03:29,848 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
InvalidateBlocks: ask 10.83.107.80:1004 to delete [blk_1073742177_1353, 
blk_1073742178_1359, blk_1073742179_1355, blk_1073742180_1356, 
blk_1073742181_1358, blk_1073742182_1361, blk_1073742183_1362, 
blk_1073742184_1363, blk_1073742185_1364, blk_1073742186_1366, 
blk_1073742187_1367, blk_1073742188_1368, blk_1073742189_1371]
{noformat}

This seems to suggest that the block was successfully deleted, but then DN got 
into a death spiral inside of a scanner.

I can keep the cluster running for a few days if anybody is willing to take a 
look. Ask me for creds via a personal email.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to