Re: Spark on YARN multitenancy

Ashwin Sai Shankar Tue, 15 Dec 2015 10:17:14 -0800

We run large multi-tenant clusters with spark/hadoop workloads, and we use
'yarn's preemption'/'spark's dynamic allocation' to achieve multitenancy.


See following link on how to enable/configure preemption using fair
scheduler :
http://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-site/FairScheduler.html



On Tue, Dec 15, 2015 at 9:37 AM, Ben Roling <ben.rol...@gmail.com> wrote:

> Oops - I meant while it is *busy* when I said while it is *idle*.
>
> On Tue, Dec 15, 2015 at 11:35 AM Ben Roling <ben.rol...@gmail.com> wrote:
>
>> I'm curious to see the feedback others will provide.  My impression is
>> the only way to get Spark to give up resources while it is idle would be to
>> use the preemption feature of the scheduler you're using in YARN.  When
>> another user comes along the scheduler would preempt one or more Spark
>> executors to free the resources the user is entitled to.  The question
>> becomes how much inefficiency the preemption creates due to lost work that
>> has to be redone by the Spark job.  I'm not sure the best way to generalize
>> a thought about how big of a deal that would be.  I imagine it depends on
>> several factors.
>>
>> On Tue, Dec 15, 2015 at 9:31 AM David Fox <dafox7777...@gmail.com> wrote:
>>
>>> Hello Spark experts,
>>>
>>> We are currently evaluating Spark on our cluster that already supports
>>> MRv2 over YARN.
>>>
>>> We have noticed a problem with running jobs concurrently, in particular
>>> that a running Spark job will not release its resources until the job is
>>> finished. Ideally, if two people run any combination of MRv2 and Spark
>>> jobs, the resources should be fairly distributed.
>>>
>>> I have noticed a feature called "dynamic resource allocation" in Spark
>>> 1.2, but this does not seem to be solving the problem, because it releases
>>> resources only when Spark is IDLE, not while it's BUSY. What I am looking
>>> for is similar approch to MapReduce where a new user obtains fair share of
>>> resources
>>>
>>> I haven't been able to locate any further information on this matter. On
>>> the other hand, I feel this must be pretty common issue for a lot of users.
>>>
>>> So,
>>>
>>>    1. What is your experience when dealing with multitenant (multiple
>>>    users) Spark cluster with YARN?
>>>    2. Is Spark architectually adept to support releasing resources
>>>    while it's busy? Is this a planned feature or is it something that
>>>    conflicts with the idea of Spark executors?
>>>
>>> Thanks
>>>
>>

Re: Spark on YARN multitenancy

Reply via email to