Re: YARN replica selection

Ravi Prakash Sat, 20 Jun 2015 13:44:07 -0700

Hi Muthu!

Hitesh is correct. The behavior is application specific in the sensethat its the application AM which asks for containers. Look athttps://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java#L1153for MapReduce's behavior.

The Yarn ResourceManager's scheduler (e.g. Capacity / Fair) will thendecide based on the resource requests. Here's some code if you want toread ithttps://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java#L1261



On 06/19/15 09:02, Hitesh Shah wrote:

Moving conversation to yarn-dev. BCC’ed hdfs-dev.

YARN actually does not do anything except give back containers based on what an 
application requested for. It is up to each and every application to first 
figure out where the data is located and then make optimal choices based on 
which node to prefer for scheduling. I believe MapReduce has some changes to 
use potentially memory-based block locations over disk-based ones but I don’t 
believe there is any significant work in any YARN application that makes 
cost-based decisions based on the various storage types of where blocks are 
available.

thanks
— Hitesh


On Jun 19, 2015, at 12:33 AM, Muthu Ganesh <mutg...@gmail.com> wrote:

Hi,

How does YARN decide which replica to use when scheduling a task or is it
random?

Does the YARN scheduler give a priority to SSD storage types over DISK
storage types for the HOT_STORAGE_POLICY when scheduling data local tasks?

Please let me know if this should be posted in YARN developers mailing list
instead.


Thanks.
Muthu

Re: YARN replica selection

Reply via email to