On 16/10/11 02:53, Bharath Ravi wrote:
Hi all,

I have a question about how HDFS load balances requests for files/blocks:

HDFS currently distributes data blocks randomly, for balance.
However, if certain files/blocks are more popular than others, some nodes
might get an "unfair" number of requests.
Adding more replicas for these popular files might not help, unless HDFS
explicitly distributes requests fairly among the replicas.

Have a look at the ReplicationTargetChooser class; it does take datanode load into account, though it's concern is distribution for data availability, not performance.

The standard technique for popular files -including MR job JAR files- is to over-replicate. One problem: how to determine what is popular without adding more load on the namenode

Reply via email to