Re: How does spark manage the memory of executor with multiple tasks

canan chen Wed, 27 May 2015 00:49:51 -0700

Does anyone can answer my question ? I am curious to know if there's
multiple reducer tasks in one executor, how to allocate memory between
these reducers tasks since each shuffle will consume a lot of memory ?


On Tue, May 26, 2015 at 7:27 PM, Evo Eftimov <evo.efti...@isecc.com> wrote:

>  the link you sent says multiple executors per node
>
> Worker is just demon process launching Executors / JVMs so it can execute
> tasks - it does that by cooperating with the master and the driver
>
> There is a one to one maping between Executor and JVM
>
>
> Sent from Samsung Mobile
>
>
> -------- Original message --------
> From: Arush Kharbanda
> Date:2015/05/26 10:55 (GMT+00:00)
> To: canan chen
> Cc: Evo Eftimov ,user@spark.apache.org
> Subject: Re: How does spark manage the memory of executor with multiple
> tasks
>
> Hi Evo,
>
> Worker is the JVM and an executor runs on the JVM. And after Spark 1.4 you
> would be able to run multiple executors on the same JVM/worker.
>
> https://issues.apache.org/jira/browse/SPARK-1706.
>
> Thanks
> Arush
>
> On Tue, May 26, 2015 at 2:54 PM, canan chen <ccn...@gmail.com> wrote:
>
>> I think the concept of task in spark should be on the same level of task
>> in MR. Usually in MR, we need to specify the memory the each mapper/reducer
>> task. And I believe executor is not a user-facing concept, it's a spark
>> internal concept. For spark users they don't need to know the concept of
>> executor, but need to know the concept of task.
>>
>> On Tue, May 26, 2015 at 5:09 PM, Evo Eftimov <evo.efti...@isecc.com>
>> wrote:
>>
>>> This is the first time I hear that “one can specify the RAM per task” –
>>> the RAM is granted per Executor (JVM). On the other hand each Task operates
>>> on ONE RDD Partition – so you can say that this is “the RAM allocated to
>>> the Task to process” – but it is still within the boundaries allocated to
>>> the Executor (JVM) within which the Task is running. Also while running,
>>> any Task like any JVM Thread can request as much additional RAM e.g. for
>>> new Object instances  as there is available in the Executor aka JVM Heap
>>>
>>>
>>>
>>> *From:* canan chen [mailto:ccn...@gmail.com]
>>> *Sent:* Tuesday, May 26, 2015 9:30 AM
>>> *To:* Evo Eftimov
>>> *Cc:* user@spark.apache.org
>>> *Subject:* Re: How does spark manage the memory of executor with
>>> multiple tasks
>>>
>>>
>>>
>>> Yes, I know that one task represent a JVM thread. This is what I
>>> confused. Usually users want to specify the memory on task level, so how
>>> can I do it if task if thread level and multiple tasks runs in the same
>>> executor. And even I don't know how many threads there will be. Besides
>>> that, if one task cause OOM, it would cause other tasks in the same
>>> executor fail too. There's no isolation between tasks.
>>>
>>>
>>>
>>> On Tue, May 26, 2015 at 4:15 PM, Evo Eftimov <evo.efti...@isecc.com>
>>> wrote:
>>>
>>> An Executor is a JVM instance spawned and running on a Cluster Node
>>> (Server machine). Task is essentially a JVM Thread – you can have as many
>>> Threads as you want per JVM. You will also hear about “Executor Slots” –
>>> these are essentially the CPU Cores available on the machine and granted
>>> for use to the Executor
>>>
>>>
>>>
>>> Ps: what creates ongoing confusion here is that the Spark folks have
>>> “invented” their own terms to describe the design of their what is
>>> essentially a Distributed OO Framework facilitating Parallel Programming
>>> and Data Management in a Distributed Environment, BUT have not provided
>>> clear dictionary/explanations linking these “inventions” with standard
>>> concepts familiar to every Java, Scala etc developer
>>>
>>>
>>>
>>> *From:* canan chen [mailto:ccn...@gmail.com]
>>> *Sent:* Tuesday, May 26, 2015 9:02 AM
>>> *To:* user@spark.apache.org
>>> *Subject:* How does spark manage the memory of executor with multiple
>>> tasks
>>>
>>>
>>>
>>> Since spark can run multiple tasks in one executor, so I am curious to
>>> know how does spark manage memory across these tasks. Say if one executor
>>> takes 1GB memory, then if this executor can run 10 tasks simultaneously,
>>> then each task can consume 100MB on average. Do I understand it correctly ?
>>> It doesn't make sense to me that spark run multiple tasks in one executor.
>>>
>>>
>>>
>>
>>
>
>
> --
>
> [image: Sigmoid Analytics] <http://htmlsig.com/www.sigmoidanalytics.com>
>
> *Arush Kharbanda* || Technical Teamlead
>
> ar...@sigmoidanalytics.com || www.sigmoidanalytics.com
>

Re: How does spark manage the memory of executor with multiple tasks

Reply via email to