Tim, We will try to run the application in coarse grain mode, and share the findings with you.
Regards Sumit Chawla On Mon, Dec 19, 2016 at 3:11 PM, Timothy Chen <tnac...@gmail.com> wrote: > Dynamic allocation works with Coarse grain mode only, we wasn't aware > a need for Fine grain mode after we enabled dynamic allocation support > on the coarse grain mode. > > What's the reason you're running fine grain mode instead of coarse > grain + dynamic allocation? > > Tim > > On Mon, Dec 19, 2016 at 2:45 PM, Mehdi Meziane > <mehdi.mezi...@ldmobile.net> wrote: > > We will be interested by the results if you give a try to Dynamic > allocation > > with mesos ! > > > > > > ----- Mail Original ----- > > De: "Michael Gummelt" <mgumm...@mesosphere.io> > > À: "Sumit Chawla" <sumitkcha...@gmail.com> > > Cc: u...@mesos.apache.org, d...@mesos.apache.org, "User" > > <u...@spark.apache.org>, dev@spark.apache.org > > Envoyé: Lundi 19 Décembre 2016 22h42:55 GMT +01:00 Amsterdam / Berlin / > > Berne / Rome / Stockholm / Vienne > > Objet: Re: Mesos Spark Fine Grained Execution - CPU count > > > > > >> Is this problem of idle executors sticking around solved in Dynamic > >> Resource Allocation? Is there some timeout after which Idle executors > can > >> just shutdown and cleanup its resources. > > > > Yes, that's exactly what dynamic allocation does. But again I have no > idea > > what the state of dynamic allocation + mesos is. > > > > On Mon, Dec 19, 2016 at 1:32 PM, Chawla,Sumit <sumitkcha...@gmail.com> > > wrote: > >> > >> Great. Makes much better sense now. What will be reason to have > >> spark.mesos.mesosExecutor.cores more than 1, as this number doesn't > include > >> the number of cores for tasks. > >> > >> So in my case it seems like 30 CPUs are allocated to executors. And > there > >> are 48 tasks so 48 + 30 = 78 CPUs. And i am noticing this gap of 30 is > >> maintained till the last task exits. This explains the gap. Thanks > >> everyone. I am still not sure how this number 30 is calculated. ( Is > it > >> dynamic based on current resources, or is it some configuration. I > have 32 > >> nodes in my cluster). > >> > >> Is this problem of idle executors sticking around solved in Dynamic > >> Resource Allocation? Is there some timeout after which Idle executors > can > >> just shutdown and cleanup its resources. > >> > >> > >> Regards > >> Sumit Chawla > >> > >> > >> On Mon, Dec 19, 2016 at 12:45 PM, Michael Gummelt < > mgumm...@mesosphere.io> > >> wrote: > >>> > >>> > I should preassume that No of executors should be less than number > of > >>> > tasks. > >>> > >>> No. Each executor runs 0 or more tasks. > >>> > >>> Each executor consumes 1 CPU, and each task running on that executor > >>> consumes another CPU. You can customize this via > >>> spark.mesos.mesosExecutor.cores > >>> (https://github.com/apache/spark/blob/v1.6.3/docs/running-on-mesos.md) > and > >>> spark.task.cpus > >>> (https://github.com/apache/spark/blob/v1.6.3/docs/configuration.md) > >>> > >>> On Mon, Dec 19, 2016 at 12:09 PM, Chawla,Sumit <sumitkcha...@gmail.com > > > >>> wrote: > >>>> > >>>> Ah thanks. looks like i skipped reading this "Neither will executors > >>>> terminate when they’re idle." > >>>> > >>>> So in my job scenario, I should preassume that No of executors should > >>>> be less than number of tasks. Ideally one executor should execute 1 > or more > >>>> tasks. But i am observing something strange instead. I start my job > with > >>>> 48 partitions for a spark job. In mesos ui i see that number of tasks > is 48, > >>>> but no. of CPUs is 78 which is way more than 48. Here i am assuming > that 1 > >>>> CPU is 1 executor. I am not specifying any configuration to set > number of > >>>> cores per executor. > >>>> > >>>> Regards > >>>> Sumit Chawla > >>>> > >>>> > >>>> On Mon, Dec 19, 2016 at 11:35 AM, Joris Van Remoortere > >>>> <jo...@mesosphere.io> wrote: > >>>>> > >>>>> That makes sense. From the documentation it looks like the executors > >>>>> are not supposed to terminate: > >>>>> > >>>>> http://spark.apache.org/docs/latest/running-on-mesos.html# > fine-grained-deprecated > >>>>>> > >>>>>> Note that while Spark tasks in fine-grained will relinquish cores as > >>>>>> they terminate, they will not relinquish memory, as the JVM does > not give > >>>>>> memory back to the Operating System. Neither will executors > terminate when > >>>>>> they’re idle. > >>>>> > >>>>> > >>>>> I suppose your task to executor CPU ratio is low enough that it looks > >>>>> like most of the resources are not being reclaimed. If your tasks > were using > >>>>> significantly more CPU the amortized cost of the idle executors > would not be > >>>>> such a big deal. > >>>>> > >>>>> > >>>>> — > >>>>> Joris Van Remoortere > >>>>> Mesosphere > >>>>> > >>>>> On Mon, Dec 19, 2016 at 11:26 AM, Timothy Chen <tnac...@gmail.com> > >>>>> wrote: > >>>>>> > >>>>>> Hi Chawla, > >>>>>> > >>>>>> One possible reason is that Mesos fine grain mode also takes up > cores > >>>>>> to run the executor per host, so if you have 20 agents running Fine > >>>>>> grained executor it will take up 20 cores while it's still running. > >>>>>> > >>>>>> Tim > >>>>>> > >>>>>> On Fri, Dec 16, 2016 at 8:41 AM, Chawla,Sumit < > sumitkcha...@gmail.com> > >>>>>> wrote: > >>>>>> > Hi > >>>>>> > > >>>>>> > I am using Spark 1.6. I have one query about Fine Grained model in > >>>>>> > Spark. > >>>>>> > I have a simple Spark application which transforms A -> B. Its a > >>>>>> > single > >>>>>> > stage application. To begin the program, It starts with 48 > >>>>>> > partitions. > >>>>>> > When the program starts running, in mesos UI it shows 48 tasks and > >>>>>> > 48 CPUs > >>>>>> > allocated to job. Now as the tasks get done, the number of active > >>>>>> > tasks > >>>>>> > number starts decreasing. How ever, the number of CPUs does not > >>>>>> > decrease > >>>>>> > propotionally. When the job was about to finish, there was a > single > >>>>>> > remaininig task, however CPU count was still 20. > >>>>>> > > >>>>>> > My questions, is why there is no one to one mapping between tasks > >>>>>> > and cpus > >>>>>> > in Fine grained? How can these CPUs be released when the job is > >>>>>> > done, so > >>>>>> > that other jobs can start. > >>>>>> > > >>>>>> > > >>>>>> > Regards > >>>>>> > Sumit Chawla > >>>>> > >>>>> > >>>> > >>> > >>> > >>> > >>> -- > >>> Michael Gummelt > >>> Software Engineer > >>> Mesosphere > >> > >> > > > > > > > > -- > > Michael Gummelt > > Software Engineer > > Mesosphere >