Frode Halvorsen created HDFS-7815:
-------------------------------------

             Summary: Loop on 'blocks does not belong to any file'
                 Key: HDFS-7815
                 URL: https://issues.apache.org/jira/browse/HDFS-7815
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: datanode, namenode
    Affects Versions: 2.6.0
         Environment: small cluster on RetHat. 2 namenodes (HA),  6 datanodes 
with 19TB disk for hdfs.
            Reporter: Frode Halvorsen


I am currently experincing a looping situation;
The namenode uses appx 1:50 (min:sec) to log a massive amount of lines stating 
that some blocks don't belong to any file. During this time, it's unresponsive 
to any requests from datanodes, and if the zoo-keper had been running, it would 
have taken the name-node down (ssh-fencing : kill).
When it has finished the 'round', it starts to do some normal work, and among 
other things, telling the datanode to delete the blocks. But before the 
datanode has gotten around to delete the blocks, and is about to report back to 
the namenode, the namenode  has stared on the next round of reporing the same 
blocks that don't belong to anly file. Thus, the datanode gets a timout when 
reporing block-updates for the deleted blocks, And this, of course repeats 
itself over and over again... 

There is actually two issues , I think,;
1- the namenode gets totally unresponsive when reporing the blocks (could this 
be a debug-line instead of a INFO-line)
2 - the namenode seems to 'forget' that it has already reported those blocks 
just 2-3 minutes ago...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to