[jira] Created: (HDFS-599) Improve Namenode robustness by prioritizing datanode heartbeats over client requests

dhruba borthakur (JIRA) Sat, 05 Sep 2009 00:48:24 -0700

Improve Namenode robustness by prioritizing datanode heartbeats over client 
requests
------------------------------------------------------------------------------------


                 Key: HDFS-599
                 URL: https://issues.apache.org/jira/browse/HDFS-599
             Project: Hadoop HDFS
          Issue Type: Improvement
          Components: name-node
            Reporter: dhruba borthakur
            Assignee: dhruba borthakur


The namenode processes RPC requests from clients that are reading/writing to 
files as well as heartbeats/block reports from datanodes.

Sometime, because of various reasons (Java GC runs, inconsistent performance of 
NFS filer that stores HDFS transacttion logs, etc), the namenode encounters 
transient slowness. For example, if the device that stores the HDFS transaction 
logs becomes sluggish, the Namenode's ability to process RPCs slows down to a 
certain extent. During this time, the RPCs from clients as well as the RPCs 
from datanodes suffer in similar fashion. If the underlying problem becomes 
worse, the NN's ability to process a heartbeat from a DN is severly impacted, 
thus causing the NN to declare that the DN is dead. Then the NN starts 
replicating blocks that used to reside on the now-declared-dead datanode. This 
adds extra load to the NN. Then the now-declared-datanode finally 
re-establishes contact with the NN, and sends a block report. The block report 
processing on the NN is another heavyweight activity, thus casing more load to 
the already overloaded namenode. 

My proposal is tha the NN should try its best to continue processing RPCs from 
datanodes and give lesser priority to serving client requests. The Datanode 
RPCs are integral to the consistency and performance of the Hadoop file system, 
and it is better to protect it at all costs. This will ensure that NN  recovers 
from the hiccup much faster than what it does now.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HDFS-599) Improve Namenode robustness by prioritizing datanode heartbeats over client requests

Reply via email to