Moving discussion to hdfs-dev. DataNodes report disk usage (space) and current transfer thread counts (load) periodically to the NameNode. NameNode uses this information to make a decision while building the pipeline DNs for your client request. I believe the class is called something like ReplicationTargetChooser (off the top of my mind), and you can take a look at the logic involved there in selecting any form of node (isGoodTarget or summat). Then work your way downwards to see how the information flows.
On 07-Dec-2011, at 8:36 AM, 郭冲 wrote: > Hadoop:The Definitive Guide said that when the client is out of the cluster, > hadoop will select the storage position of a block randomly,but it will not > select the datanode which is too busy or too full. > > so i want to kown that how hadoop judge or measure a datanode is full or not?