Re: Spark on YARN multitenancy

Ben Roling Tue, 15 Dec 2015 10:05:42 -0800

I'm curious to see the feedback others will provide.  My impression is the
only way to get Spark to give up resources while it is idle would be to use
the preemption feature of the scheduler you're using in YARN.  When another
user comes along the scheduler would preempt one or more Spark executors to
free the resources the user is entitled to.  The question becomes how much
inefficiency the preemption creates due to lost work that has to be redone
by the Spark job.  I'm not sure the best way to generalize a thought about
how big of a deal that would be.  I imagine it depends on several factors.


On Tue, Dec 15, 2015 at 9:31 AM David Fox <dafox7777...@gmail.com> wrote:

> Hello Spark experts,
>
> We are currently evaluating Spark on our cluster that already supports
> MRv2 over YARN.
>
> We have noticed a problem with running jobs concurrently, in particular
> that a running Spark job will not release its resources until the job is
> finished. Ideally, if two people run any combination of MRv2 and Spark
> jobs, the resources should be fairly distributed.
>
> I have noticed a feature called "dynamic resource allocation" in Spark
> 1.2, but this does not seem to be solving the problem, because it releases
> resources only when Spark is IDLE, not while it's BUSY. What I am looking
> for is similar approch to MapReduce where a new user obtains fair share of
> resources
>
> I haven't been able to locate any further information on this matter. On
> the other hand, I feel this must be pretty common issue for a lot of users.
>
> So,
>
>    1. What is your experience when dealing with multitenant (multiple
>    users) Spark cluster with YARN?
>    2. Is Spark architectually adept to support releasing resources while
>    it's busy? Is this a planned feature or is it something that conflicts with
>    the idea of Spark executors?
>
> Thanks
>

Re: Spark on YARN multitenancy

Reply via email to