I've used fine-grained mode on our mesos spark clusters until this week,
mostly because it was the default. I started trying coarse-grained because
of the recent chatter on the mailing list about wanting to move the mesos
execution path to coarse-grained only. The odd things is, coarse-grained vs
fine-grained seems to yield drastic cluster utilization metrics for any of
our jobs that I've tried out this week.

If this is best as a new thread, please let me know, and I'll try not to
derail this conversation. Otherwise, details below:

We monitor our spark clusters with ganglia, and historically, we maintain
at least 90% cpu utilization across the cluster. Making a single
configuration change to use coarse-grained execution instead of
fine-grained consistently yields a cpu utilization pattern that starts
around 90% at the beginning of the job, and then it slowly decreases over
the next 1-1.5 hours to level out around 65% cpu utilization on the
cluster. Does anyone have a clue why I'd be seeing such a negative effect
of switching to coarse-grained mode? GC activity is comparable in both
cases. I've tried 1.5.2, as well as the 1.6.0 preview tag that's on github.

Thanks,
-Adam

On Fri, Nov 20, 2015 at 9:53 AM, Iulian Dragoș <iulian.dra...@typesafe.com>
wrote:

> This is a good point. We should probably document this better in the
> migration notes. In the mean time:
>
>
> http://spark.apache.org/docs/latest/running-on-mesos.html#dynamic-resource-allocation-with-mesos
>
> Roughly, dynamic allocation lets Spark add and kill executors based on the
> scheduling delay. The min and max number of executors can be configured.
> Would this fit your use-case?
>
> iulian
>
>
> On Fri, Nov 20, 2015 at 1:55 AM, Jo Voordeckers <jo.voordeck...@gmail.com>
> wrote:
>
>> As a recent fine-grained mode adopter I'm now confused after reading this
>> and other resources from spark-summit, the docs, ...  so can someone please
>> advise me for our use-case?
>>
>> We'll have 1 or 2 streaming jobs and an will run scheduled batch jobs
>> which should take resources away from the streaming jobs and give 'em back
>> upon completion.
>>
>> Can someone point me at the docs or a guide to set this up?
>>
>> Thanks!
>>
>> - Jo Voordeckers
>>
>>
>> On Thu, Nov 19, 2015 at 5:52 AM, Heller, Chris <chel...@akamai.com>
>> wrote:
>>
>>> I was one that argued for fine-grain mode, and there is something I
>>> still appreciate about how fine-grain mode operates in terms of the way one
>>> would define a Mesos framework. That said, with dyn-allocation and Mesos
>>> support for both resource reservation, oversubscription and revocation, I
>>> think the direction is clear that the coarse mode is the proper way
>>> forward, and having the two code paths is just noise.
>>>
>>> -Chris
>>>
>>> From: Iulian Dragoș <iulian.dra...@typesafe.com>
>>> Date: Thursday, November 19, 2015 at 6:42 AM
>>> To: "dev@spark.apache.org" <dev@spark.apache.org>
>>> Subject: Removing the Mesos fine-grained mode
>>>
>>> Hi all,
>>>
>>> Mesos is the only cluster manager that has a fine-grained mode, but it's
>>> more often than not problematic, and it's a maintenance burden. I'd like to
>>> suggest removing it in the 2.0 release.
>>>
>>> A few reasons:
>>>
>>> - code/maintenance complexity. The two modes duplicate a lot of
>>> functionality (and sometimes code) that leads to subtle differences or
>>> bugs. See SPARK-10444
>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_SPARK-2D10444&d=CwMFaQ&c=96ZbZZcaMF4w0F4jpN6LZg&r=ylcFa5bBSUyTQqbx1Aqz47ec5BJJc7uk0YQ4EQKh-DY&m=36NeiiniCnBgPZ3AKAvvSJYBLQNxvpOcLoAi-VwXbtc&s=4_2dJBDiLqTcfXfX1LZluOo1U6tRKR2wKGGzfwiKdVY&e=>
>>>  and
>>> also this thread
>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__mail-2Darchives.apache.org_mod-5Fmbox_spark-2Duser_201510.mbox_-253CCALxMP-2DA-2BaygNwSiyTM8ff20-2DMGWHykbhct94a2hwZTh1jWHp-5Fg-40mail.gmail.com-253E&d=CwMFaQ&c=96ZbZZcaMF4w0F4jpN6LZg&r=ylcFa5bBSUyTQqbx1Aqz47ec5BJJc7uk0YQ4EQKh-DY&m=36NeiiniCnBgPZ3AKAvvSJYBLQNxvpOcLoAi-VwXbtc&s=SNFPzodGw7sgp3km9NKYM46gZHLguvxVNzCIeUlJzOw&e=>
>>>  and MESOS-3202
>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_MESOS-2D3202&d=CwMFaQ&c=96ZbZZcaMF4w0F4jpN6LZg&r=ylcFa5bBSUyTQqbx1Aqz47ec5BJJc7uk0YQ4EQKh-DY&m=36NeiiniCnBgPZ3AKAvvSJYBLQNxvpOcLoAi-VwXbtc&s=d-U4CohYsiZc0Zmj4KETn2dT_2ZFe5s3_IIbMm2tjJo&e=>
>>> - it's not widely used (Reynold's previous thread
>>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__apache-2Dspark-2Ddevelopers-2Dlist.1001551.n3.nabble.com_Please-2Dreply-2Dif-2Dyou-2Duse-2DMesos-2Dfine-2Dgrained-2Dmode-2Dtd14930.html&d=CwMFaQ&c=96ZbZZcaMF4w0F4jpN6LZg&r=ylcFa5bBSUyTQqbx1Aqz47ec5BJJc7uk0YQ4EQKh-DY&m=36NeiiniCnBgPZ3AKAvvSJYBLQNxvpOcLoAi-VwXbtc&s=HGMiKyzxFDhpbomduKVIIRHWk9RDGDCk7tneJVQqTwo&e=>
>>> got very few responses from people relying on it)
>>> - similar functionality can be achieved with dynamic allocation +
>>> coarse-grained mode
>>>
>>> I suggest that Spark 1.6 already issues a warning if it detects
>>> fine-grained use, with removal in the 2.0 release.
>>>
>>> Thoughts?
>>>
>>> iulian
>>>
>>>
>>
>
>
> --
>
> --
> Iulian Dragos
>
> ------
> Reactive Apps on the JVM
> www.typesafe.com
>
>

Reply via email to