How does Spark honor data locality when allocating computing resources for an application

[email protected] Fri, 13 Mar 2015 19:43:45 -0700

Hi, sparkers,
When I read the code about computing resources allocation for the newly 
submitted application in the Master#schedule method,  I got a question about 
data locality:


// Pack each app into as few nodes as possible until we've assigned all its 
cores 
for (worker <- workers if worker.coresFree > 0 && worker.state == 
WorkerState.ALIVE) { 
   for (app <- waitingApps if app.coresLeft > 0) { 
      if (canUse(app, worker)) { 
          val coresToUse = math.min(worker.coresFree, app.coresLeft) 
         if (coresToUse > 0) { 
                val exec = app.addExecutor(worker, coresToUse) 
                launchExecutor(worker, exec) 
                app.state = ApplicationState.RUNNING 
         } 
     } 
  } 
}

Looks that the resource allocation policy here is that Master will assign as 
few workers as possible, so long as these few workers has enough resources for 
the application.
My question is: Assume that the data the application will process is spread on 
all the worker nodes, then the data locality is lost if using the above policy?
Not sure whether I have unstandood correctly or I have missed something.




[email protected]

How does Spark honor data locality when allocating computing resources for an application

Reply via email to