Re: Load balancing requests in HDFS

Steve Loughran Tue, 18 Oct 2011 09:38:03 -0700

On 16/10/11 02:53, Bharath Ravi wrote:

Hi all,


I have a question about how HDFS load balances requests for files/blocks:

HDFS currently distributes data blocks randomly, for balance.
However, if certain files/blocks are more popular than others, some nodes
might get an "unfair" number of requests.
Adding more replicas for these popular files might not help, unless HDFS
explicitly distributes requests fairly among the replicas.

Have a look at the ReplicationTargetChooser class; it does take datanodeload into account, though it's concern is distribution for dataavailability, not performance.

The standard technique for popular files -including MR job JAR files- isto over-replicate. One problem: how to determine what is popular withoutadding more load on the namenode

Re: Load balancing requests in HDFS

Reply via email to