[ https://issues.apache.org/jira/browse/HDFS-10220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ravi Prakash resolved HDFS-10220. --------------------------------- Resolution: Duplicate Fix Version/s: 2.6.1 Thanks Nicolas! I am going to resolve this as a duplicate to https://issues.apache.org/jira/browse/HDFS-4882 in that case. It was committed into Hadoop-2.6.1 . Please feel free to re-open this JIRA if I somehow misunderstood your problem. Thanks! > Namenode failover due to too long loking in LeaseManager.Monitor > ---------------------------------------------------------------- > > Key: HDFS-10220 > URL: https://issues.apache.org/jira/browse/HDFS-10220 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode > Reporter: Nicolas Fraison > Priority: Minor > Fix For: 2.6.1 > > Attachments: HADOOP-10220.001.patch, threaddump_zkfc.txt > > > I have faced a namenode failover due to unresponsive namenode detected by the > zkfc with lot's of WARN messages (5 millions) like this one: > _org.apache.hadoop.hdfs.StateChange: BLOCK* internalReleaseLease: All > existing blocks are COMPLETE, lease removed, file closed._ > On the threaddump taken by the zkfc there are lots of thread blocked due to a > lock. > Looking at the code, there are a lock taken by the LeaseManager.Monitor when > some lease must be released. Due to the really big number of lease to be > released the namenode has taken too many times to release them blocking all > other tasks and making the zkfc thinking that the namenode was not > available/stuck. > The idea of this patch is to limit the number of leased released each time we > check for lease so the lock won't be taken for a too long time period. -- This message was sent by Atlassian JIRA (v6.3.4#6332)