Does anyone can answer my question ? I am curious to know if there's multiple reducer tasks in one executor, how to allocate memory between these reducers tasks since each shuffle will consume a lot of memory ?
On Tue, May 26, 2015 at 7:27 PM, Evo Eftimov <evo.efti...@isecc.com> wrote: > the link you sent says multiple executors per node > > Worker is just demon process launching Executors / JVMs so it can execute > tasks - it does that by cooperating with the master and the driver > > There is a one to one maping between Executor and JVM > > > Sent from Samsung Mobile > > > -------- Original message -------- > From: Arush Kharbanda > Date:2015/05/26 10:55 (GMT+00:00) > To: canan chen > Cc: Evo Eftimov ,user@spark.apache.org > Subject: Re: How does spark manage the memory of executor with multiple > tasks > > Hi Evo, > > Worker is the JVM and an executor runs on the JVM. And after Spark 1.4 you > would be able to run multiple executors on the same JVM/worker. > > https://issues.apache.org/jira/browse/SPARK-1706. > > Thanks > Arush > > On Tue, May 26, 2015 at 2:54 PM, canan chen <ccn...@gmail.com> wrote: > >> I think the concept of task in spark should be on the same level of task >> in MR. Usually in MR, we need to specify the memory the each mapper/reducer >> task. And I believe executor is not a user-facing concept, it's a spark >> internal concept. For spark users they don't need to know the concept of >> executor, but need to know the concept of task. >> >> On Tue, May 26, 2015 at 5:09 PM, Evo Eftimov <evo.efti...@isecc.com> >> wrote: >> >>> This is the first time I hear that “one can specify the RAM per task” – >>> the RAM is granted per Executor (JVM). On the other hand each Task operates >>> on ONE RDD Partition – so you can say that this is “the RAM allocated to >>> the Task to process” – but it is still within the boundaries allocated to >>> the Executor (JVM) within which the Task is running. Also while running, >>> any Task like any JVM Thread can request as much additional RAM e.g. for >>> new Object instances as there is available in the Executor aka JVM Heap >>> >>> >>> >>> *From:* canan chen [mailto:ccn...@gmail.com] >>> *Sent:* Tuesday, May 26, 2015 9:30 AM >>> *To:* Evo Eftimov >>> *Cc:* user@spark.apache.org >>> *Subject:* Re: How does spark manage the memory of executor with >>> multiple tasks >>> >>> >>> >>> Yes, I know that one task represent a JVM thread. This is what I >>> confused. Usually users want to specify the memory on task level, so how >>> can I do it if task if thread level and multiple tasks runs in the same >>> executor. And even I don't know how many threads there will be. Besides >>> that, if one task cause OOM, it would cause other tasks in the same >>> executor fail too. There's no isolation between tasks. >>> >>> >>> >>> On Tue, May 26, 2015 at 4:15 PM, Evo Eftimov <evo.efti...@isecc.com> >>> wrote: >>> >>> An Executor is a JVM instance spawned and running on a Cluster Node >>> (Server machine). Task is essentially a JVM Thread – you can have as many >>> Threads as you want per JVM. You will also hear about “Executor Slots” – >>> these are essentially the CPU Cores available on the machine and granted >>> for use to the Executor >>> >>> >>> >>> Ps: what creates ongoing confusion here is that the Spark folks have >>> “invented” their own terms to describe the design of their what is >>> essentially a Distributed OO Framework facilitating Parallel Programming >>> and Data Management in a Distributed Environment, BUT have not provided >>> clear dictionary/explanations linking these “inventions” with standard >>> concepts familiar to every Java, Scala etc developer >>> >>> >>> >>> *From:* canan chen [mailto:ccn...@gmail.com] >>> *Sent:* Tuesday, May 26, 2015 9:02 AM >>> *To:* user@spark.apache.org >>> *Subject:* How does spark manage the memory of executor with multiple >>> tasks >>> >>> >>> >>> Since spark can run multiple tasks in one executor, so I am curious to >>> know how does spark manage memory across these tasks. Say if one executor >>> takes 1GB memory, then if this executor can run 10 tasks simultaneously, >>> then each task can consume 100MB on average. Do I understand it correctly ? >>> It doesn't make sense to me that spark run multiple tasks in one executor. >>> >>> >>> >> >> > > > -- > > [image: Sigmoid Analytics] <http://htmlsig.com/www.sigmoidanalytics.com> > > *Arush Kharbanda* || Technical Teamlead > > ar...@sigmoidanalytics.com || www.sigmoidanalytics.com >