Re: spark on yarn wastes one box (or 1 GB on each box) for am container

Alexander Pivovarov Tue, 09 Feb 2016 17:30:08 -0800

Great! Thank you!

On Tue, Feb 9, 2016 at 4:02 PM, Jonathan Kelly <jonathaka...@gmail.com>
wrote:


> You can set custom per-instance-group configurations (e.g.,
> ["classification":"yarn-site",properties:{"yarn.nodemanager.labels":"SPARKAM"}])
> using the Configurations parameter of
> http://docs.aws.amazon.com/ElasticMapReduce/latest/API/API_InstanceGroupConfig.html.
> Unfortunately, it's not currently possible to specify per-instance-group
> configurations via the CLI though, only cluster wide configurations.
>
> ~ Jonathan
>
>
> On Tue, Feb 9, 2016 at 12:36 PM Alexander Pivovarov <apivova...@gmail.com>
> wrote:
>
>> Thanks Jonathan
>>
>> Actually I'd like to use maximizeResourceAllocation.
>>
>> Ideally for me would be to add new instance group having single small box
>> labelled as AM
>> I'm not sure "aws emr create-cluster" supports setting custom LABELS ,
>> the only settings awailable are:
>>
>> InstanceCount=1,BidPrice=0.5,Name=sparkAM,InstanceGroupType=TASK,InstanceType=m3.xlarge
>>
>>
>> How can I specify yarn label AM for that box?
>>
>>
>>
>> On Tue, Feb 9, 2016 at 12:16 PM, Jonathan Kelly <jonathaka...@gmail.com>
>> wrote:
>>
>>> Interesting, I was not aware of spark.yarn.am.nodeLabelExpression.
>>>
>>> We do use YARN labels on EMR; each node is automatically labeled with
>>> its type (MASTER, CORE, or TASK). And we do
>>> set yarn.app.mapreduce.am.labels=CORE in yarn-site.xml, but we do not set
>>> spark.yarn.am.nodeLabelExpression.
>>>
>>> Does Spark somehow not actually honor this? It seems weird that Spark
>>> would have its own similar-sounding property
>>> (spark.yarn.am.nodeLabelExpression). If spark.yarn.am.nodeLabelExpression
>>> is used and yarn.app.mapreduce.am.labels ignored, I could be wrong about
>>> Spark AMs only running on CORE instances in EMR.
>>>
>>> I'm guessing though that spark.yarn.am.nodeLabelExpression would simply
>>> override yarn.app.mapreduce.am.labels, so yarn.app.mapreduce.am.labels
>>> would be treated as a default when it is set and
>>> spark.yarn.am.nodeLabelExpression is not. Is that correct?
>>>
>>> In short, Alex, you should not need to set any of the label-related
>>> properties yourself if you do what I suggested regarding using small CORE
>>> instances and large TASK instances. But if you want to do something
>>> different, it would also be possible to add a TASK instance group with
>>> small nodes and configured with some new label. Then you could set
>>> spark.yarn.am.nodeLabelExpression to that label.
>>>
>>> Thanks, Marcelo, for pointing out spark.yarn.am.nodeLabelExpression!
>>>
>>> ~ Jonathan
>>>
>>> On Tue, Feb 9, 2016 at 9:54 AM Marcelo Vanzin <van...@cloudera.com>
>>> wrote:
>>>
>>>> You should be able to use spark.yarn.am.nodeLabelExpression if your
>>>> version of YARN supports node labels (and you've added a label to the
>>>> node where you want the AM to run).
>>>>
>>>> On Tue, Feb 9, 2016 at 9:51 AM, Alexander Pivovarov
>>>> <apivova...@gmail.com> wrote:
>>>> > Am container starts first and yarn selects random computer to run it.
>>>> >
>>>> > Is it possible to configure yarn so that it selects small computer
>>>> for am
>>>> > container.
>>>> >
>>>> > On Feb 9, 2016 12:40 AM, "Sean Owen" <so...@cloudera.com> wrote:
>>>> >>
>>>> >> If it's too small to run an executor, I'd think it would be chosen
>>>> for
>>>> >> the AM as the only way to satisfy the request.
>>>> >>
>>>> >> On Tue, Feb 9, 2016 at 8:35 AM, Alexander Pivovarov
>>>> >> <apivova...@gmail.com> wrote:
>>>> >> > If I add additional small box to the cluster can I configure yarn
>>>> to
>>>> >> > select
>>>> >> > small box to run am container?
>>>> >> >
>>>> >> >
>>>> >> > On Mon, Feb 8, 2016 at 10:53 PM, Sean Owen <so...@cloudera.com>
>>>> wrote:
>>>> >> >>
>>>> >> >> Typically YARN is there because you're mediating resource requests
>>>> >> >> from things besides Spark, so yeah using every bit of the cluster
>>>> is a
>>>> >> >> little bit of a corner case. There's not a good answer if all your
>>>> >> >> nodes are the same size.
>>>> >> >>
>>>> >> >> I think you can let YARN over-commit RAM though, and allocate more
>>>> >> >> memory than it actually has. It may be beneficial to let them all
>>>> >> >> think they have an extra GB, and let one node running the AM
>>>> >> >> technically be overcommitted, a state which won't hurt at all
>>>> unless
>>>> >> >> you're really really tight on memory, in which case something
>>>> might
>>>> >> >> get killed.
>>>> >> >>
>>>> >> >> On Tue, Feb 9, 2016 at 6:49 AM, Jonathan Kelly <
>>>> jonathaka...@gmail.com>
>>>> >> >> wrote:
>>>> >> >> > Alex,
>>>> >> >> >
>>>> >> >> > That's a very good question that I've been trying to answer
>>>> myself
>>>> >> >> > recently
>>>> >> >> > too. Since you've mentioned before that you're using EMR, I
>>>> assume
>>>> >> >> > you're
>>>> >> >> > asking this because you've noticed this behavior on emr-4.3.0.
>>>> >> >> >
>>>> >> >> > In this release, we made some changes to the
>>>> >> >> > maximizeResourceAllocation
>>>> >> >> > (which you may or may not be using, but either way this issue is
>>>> >> >> > present),
>>>> >> >> > including the accidental inclusion of somewhat of a bug that
>>>> makes it
>>>> >> >> > not
>>>> >> >> > reserve any space for the AM, which ultimately results in one
>>>> of the
>>>> >> >> > nodes
>>>> >> >> > being utilized only by the AM and not an executor.
>>>> >> >> >
>>>> >> >> > However, as you point out, the only viable fix seems to be to
>>>> reserve
>>>> >> >> > enough
>>>> >> >> > memory for the AM on *every single node*, which in some cases
>>>> might
>>>> >> >> > actually
>>>> >> >> > be worse than wasting a lot of memory on a single node.
>>>> >> >> >
>>>> >> >> > So yeah, I also don't like either option. Is this just the
>>>> price you
>>>> >> >> > pay
>>>> >> >> > for
>>>> >> >> > running on YARN?
>>>> >> >> >
>>>> >> >> >
>>>> >> >> > ~ Jonathan
>>>> >> >> >
>>>> >> >> > On Mon, Feb 8, 2016 at 9:03 PM Alexander Pivovarov
>>>> >> >> > <apivova...@gmail.com>
>>>> >> >> > wrote:
>>>> >> >> >>
>>>> >> >> >> Lets say that yarn has 53GB memory available on each slave
>>>> >> >> >>
>>>> >> >> >> spark.am container needs 896MB.  (512 + 384)
>>>> >> >> >>
>>>> >> >> >> I see two options to configure spark:
>>>> >> >> >>
>>>> >> >> >> 1. configure spark executors to use 52GB and leave 1 GB on
>>>> each box.
>>>> >> >> >> So,
>>>> >> >> >> some box will also run am container. So, 1GB memory will not
>>>> be used
>>>> >> >> >> on
>>>> >> >> >> all
>>>> >> >> >> slaves but one.
>>>> >> >> >>
>>>> >> >> >> 2. configure spark to use all 53GB and add additional 53GB box
>>>> which
>>>> >> >> >> will
>>>> >> >> >> run only am container. So, 52GB on this additional box will do
>>>> >> >> >> nothing
>>>> >> >> >>
>>>> >> >> >> I do not like both options. Is there a better way to configure
>>>> >> >> >> yarn/spark?
>>>> >> >> >>
>>>> >> >> >>
>>>> >> >> >> Alex
>>>> >> >
>>>> >> >
>>>>
>>>>
>>>>
>>>> --
>>>> Marcelo
>>>>
>>>
>>

Re: spark on yarn wastes one box (or 1 GB on each box) for am container

Reply via email to