I found the fix. I had to add also a rack local request for reducer. And set the relaxLocality to false for both ANY and rack requests, in order to force data locality.
robert On Sunday, November 24, 2013 6:32 PM, Grandl Robert <rgra...@yahoo.com> wrote: Hi guys, I am trying to run an experiment where I want the reducers to be forced executing on a specific host. The simplest way I found it is to modify in RMContainerAllocator such that whenever there is an addContainerReq(ContainerRequest) for a reducer, I simply do: addResourceRequest(req.priority, host_X_name, req.capability) and return, instead of setting the resourceName to ResourceRequest.ANY. Unfortunately when I pass such a ResourceRequest for reducers through a makeRemoteRequest, the RM crashes, giving an ugly exception like: FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type NODE_UPDATE to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:841) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:640) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:554) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:695) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:739) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:86) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:549) at java.lang.Thread.run(Thread.java:722) Do you have other suggestion how I can enforce reducers execution on a certain host ? Or do you know what I am doing wrong here ? Thanks, robert