Hi Arun, You can achieve this by setting spark.scheduler.maxRegisteredResourcesWaitingTime to some really high number and spark.scheduler.minRegisteredResourcesRatio to 1.0.
-Sandy On Wed, Jun 24, 2015 at 2:21 AM, Steve Loughran <[email protected]> wrote: > > On 24 Jun 2015, at 05:55, canan chen <[email protected]> wrote: > > Why do you want it start until all the resources are ready ? Make it > start as early as possible should make it complete earlier and increase the > utilization of resources > > On Tue, Jun 23, 2015 at 10:34 PM, Arun Luthra <[email protected]> > wrote: > >> Sometimes if my Hortonworks yarn-enabled cluster is fairly busy, Spark >> (via spark-submit) will begin its processing even though it apparently did >> not get all of the requested resources; it is running very slowly. >> >> Is there a way to force Spark/YARN to only begin when it has the full >> set of resources that I request? >> >> Thanks, >> Arun >> > > > > The "wait until there's space" launch policy is known as Gang > Scheduling, https://issues.apache.org/jira/browse/YARN-624 covers what > would be needed there. > > 1. It's not in YARN > > 2. For analytics workloads, it's not clear you benefit. You would wait a > very long time(*) for the requirements to be satisfied. The current YARN > scheduling and placement algorithms assume that you'd prefer "timely > container launch" to "extended wait for containers in the right place", and > expects algorithms to work in a degraded form with a reduced no. of workers > > 3. Where it really matters is long-lived applications where you need > some quorum of container-hosted processes, or if performance collapses > utterly below a threshold. Things like HBase on YARN are an example —but > Spark streaming could be another. > > In the absence of YARN support, it can be implemented in the application > by having theYARN-hosted application (here: Spark) get the containers, > start up a process on each one, but not actually start accepting/performing > work until a threshold of containers is reached/some timeout has occurred. > > If you wanted to do that in spark, you could raise the idea on the spark > dev lists and see what people think. > > -Steve > > (*) i.e. forever >
