I've used fine-grained mode on our mesos spark clusters until this week, mostly because it was the default. I started trying coarse-grained because of the recent chatter on the mailing list about wanting to move the mesos execution path to coarse-grained only. The odd things is, coarse-grained vs fine-grained seems to yield drastic cluster utilization metrics for any of our jobs that I've tried out this week.
If this is best as a new thread, please let me know, and I'll try not to derail this conversation. Otherwise, details below: We monitor our spark clusters with ganglia, and historically, we maintain at least 90% cpu utilization across the cluster. Making a single configuration change to use coarse-grained execution instead of fine-grained consistently yields a cpu utilization pattern that starts around 90% at the beginning of the job, and then it slowly decreases over the next 1-1.5 hours to level out around 65% cpu utilization on the cluster. Does anyone have a clue why I'd be seeing such a negative effect of switching to coarse-grained mode? GC activity is comparable in both cases. I've tried 1.5.2, as well as the 1.6.0 preview tag that's on github. Thanks, -Adam On Fri, Nov 20, 2015 at 9:53 AM, Iulian Dragoș <iulian.dra...@typesafe.com> wrote: > This is a good point. We should probably document this better in the > migration notes. In the mean time: > > > http://spark.apache.org/docs/latest/running-on-mesos.html#dynamic-resource-allocation-with-mesos > > Roughly, dynamic allocation lets Spark add and kill executors based on the > scheduling delay. The min and max number of executors can be configured. > Would this fit your use-case? > > iulian > > > On Fri, Nov 20, 2015 at 1:55 AM, Jo Voordeckers <jo.voordeck...@gmail.com> > wrote: > >> As a recent fine-grained mode adopter I'm now confused after reading this >> and other resources from spark-summit, the docs, ... so can someone please >> advise me for our use-case? >> >> We'll have 1 or 2 streaming jobs and an will run scheduled batch jobs >> which should take resources away from the streaming jobs and give 'em back >> upon completion. >> >> Can someone point me at the docs or a guide to set this up? >> >> Thanks! >> >> - Jo Voordeckers >> >> >> On Thu, Nov 19, 2015 at 5:52 AM, Heller, Chris <chel...@akamai.com> >> wrote: >> >>> I was one that argued for fine-grain mode, and there is something I >>> still appreciate about how fine-grain mode operates in terms of the way one >>> would define a Mesos framework. That said, with dyn-allocation and Mesos >>> support for both resource reservation, oversubscription and revocation, I >>> think the direction is clear that the coarse mode is the proper way >>> forward, and having the two code paths is just noise. >>> >>> -Chris >>> >>> From: Iulian Dragoș <iulian.dra...@typesafe.com> >>> Date: Thursday, November 19, 2015 at 6:42 AM >>> To: "dev@spark.apache.org" <dev@spark.apache.org> >>> Subject: Removing the Mesos fine-grained mode >>> >>> Hi all, >>> >>> Mesos is the only cluster manager that has a fine-grained mode, but it's >>> more often than not problematic, and it's a maintenance burden. I'd like to >>> suggest removing it in the 2.0 release. >>> >>> A few reasons: >>> >>> - code/maintenance complexity. The two modes duplicate a lot of >>> functionality (and sometimes code) that leads to subtle differences or >>> bugs. See SPARK-10444 >>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_SPARK-2D10444&d=CwMFaQ&c=96ZbZZcaMF4w0F4jpN6LZg&r=ylcFa5bBSUyTQqbx1Aqz47ec5BJJc7uk0YQ4EQKh-DY&m=36NeiiniCnBgPZ3AKAvvSJYBLQNxvpOcLoAi-VwXbtc&s=4_2dJBDiLqTcfXfX1LZluOo1U6tRKR2wKGGzfwiKdVY&e=> >>> and >>> also this thread >>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__mail-2Darchives.apache.org_mod-5Fmbox_spark-2Duser_201510.mbox_-253CCALxMP-2DA-2BaygNwSiyTM8ff20-2DMGWHykbhct94a2hwZTh1jWHp-5Fg-40mail.gmail.com-253E&d=CwMFaQ&c=96ZbZZcaMF4w0F4jpN6LZg&r=ylcFa5bBSUyTQqbx1Aqz47ec5BJJc7uk0YQ4EQKh-DY&m=36NeiiniCnBgPZ3AKAvvSJYBLQNxvpOcLoAi-VwXbtc&s=SNFPzodGw7sgp3km9NKYM46gZHLguvxVNzCIeUlJzOw&e=> >>> and MESOS-3202 >>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_MESOS-2D3202&d=CwMFaQ&c=96ZbZZcaMF4w0F4jpN6LZg&r=ylcFa5bBSUyTQqbx1Aqz47ec5BJJc7uk0YQ4EQKh-DY&m=36NeiiniCnBgPZ3AKAvvSJYBLQNxvpOcLoAi-VwXbtc&s=d-U4CohYsiZc0Zmj4KETn2dT_2ZFe5s3_IIbMm2tjJo&e=> >>> - it's not widely used (Reynold's previous thread >>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__apache-2Dspark-2Ddevelopers-2Dlist.1001551.n3.nabble.com_Please-2Dreply-2Dif-2Dyou-2Duse-2DMesos-2Dfine-2Dgrained-2Dmode-2Dtd14930.html&d=CwMFaQ&c=96ZbZZcaMF4w0F4jpN6LZg&r=ylcFa5bBSUyTQqbx1Aqz47ec5BJJc7uk0YQ4EQKh-DY&m=36NeiiniCnBgPZ3AKAvvSJYBLQNxvpOcLoAi-VwXbtc&s=HGMiKyzxFDhpbomduKVIIRHWk9RDGDCk7tneJVQqTwo&e=> >>> got very few responses from people relying on it) >>> - similar functionality can be achieved with dynamic allocation + >>> coarse-grained mode >>> >>> I suggest that Spark 1.6 already issues a warning if it detects >>> fine-grained use, with removal in the 2.0 release. >>> >>> Thoughts? >>> >>> iulian >>> >>> >> > > > -- > > -- > Iulian Dragos > > ------ > Reactive Apps on the JVM > www.typesafe.com > >