Great! Thank you! On Tue, Feb 9, 2016 at 4:02 PM, Jonathan Kelly <jonathaka...@gmail.com> wrote:
> You can set custom per-instance-group configurations (e.g., > ["classification":"yarn-site",properties:{"yarn.nodemanager.labels":"SPARKAM"}]) > using the Configurations parameter of > http://docs.aws.amazon.com/ElasticMapReduce/latest/API/API_InstanceGroupConfig.html. > Unfortunately, it's not currently possible to specify per-instance-group > configurations via the CLI though, only cluster wide configurations. > > ~ Jonathan > > > On Tue, Feb 9, 2016 at 12:36 PM Alexander Pivovarov <apivova...@gmail.com> > wrote: > >> Thanks Jonathan >> >> Actually I'd like to use maximizeResourceAllocation. >> >> Ideally for me would be to add new instance group having single small box >> labelled as AM >> I'm not sure "aws emr create-cluster" supports setting custom LABELS , >> the only settings awailable are: >> >> InstanceCount=1,BidPrice=0.5,Name=sparkAM,InstanceGroupType=TASK,InstanceType=m3.xlarge >> >> >> How can I specify yarn label AM for that box? >> >> >> >> On Tue, Feb 9, 2016 at 12:16 PM, Jonathan Kelly <jonathaka...@gmail.com> >> wrote: >> >>> Interesting, I was not aware of spark.yarn.am.nodeLabelExpression. >>> >>> We do use YARN labels on EMR; each node is automatically labeled with >>> its type (MASTER, CORE, or TASK). And we do >>> set yarn.app.mapreduce.am.labels=CORE in yarn-site.xml, but we do not set >>> spark.yarn.am.nodeLabelExpression. >>> >>> Does Spark somehow not actually honor this? It seems weird that Spark >>> would have its own similar-sounding property >>> (spark.yarn.am.nodeLabelExpression). If spark.yarn.am.nodeLabelExpression >>> is used and yarn.app.mapreduce.am.labels ignored, I could be wrong about >>> Spark AMs only running on CORE instances in EMR. >>> >>> I'm guessing though that spark.yarn.am.nodeLabelExpression would simply >>> override yarn.app.mapreduce.am.labels, so yarn.app.mapreduce.am.labels >>> would be treated as a default when it is set and >>> spark.yarn.am.nodeLabelExpression is not. Is that correct? >>> >>> In short, Alex, you should not need to set any of the label-related >>> properties yourself if you do what I suggested regarding using small CORE >>> instances and large TASK instances. But if you want to do something >>> different, it would also be possible to add a TASK instance group with >>> small nodes and configured with some new label. Then you could set >>> spark.yarn.am.nodeLabelExpression to that label. >>> >>> Thanks, Marcelo, for pointing out spark.yarn.am.nodeLabelExpression! >>> >>> ~ Jonathan >>> >>> On Tue, Feb 9, 2016 at 9:54 AM Marcelo Vanzin <van...@cloudera.com> >>> wrote: >>> >>>> You should be able to use spark.yarn.am.nodeLabelExpression if your >>>> version of YARN supports node labels (and you've added a label to the >>>> node where you want the AM to run). >>>> >>>> On Tue, Feb 9, 2016 at 9:51 AM, Alexander Pivovarov >>>> <apivova...@gmail.com> wrote: >>>> > Am container starts first and yarn selects random computer to run it. >>>> > >>>> > Is it possible to configure yarn so that it selects small computer >>>> for am >>>> > container. >>>> > >>>> > On Feb 9, 2016 12:40 AM, "Sean Owen" <so...@cloudera.com> wrote: >>>> >> >>>> >> If it's too small to run an executor, I'd think it would be chosen >>>> for >>>> >> the AM as the only way to satisfy the request. >>>> >> >>>> >> On Tue, Feb 9, 2016 at 8:35 AM, Alexander Pivovarov >>>> >> <apivova...@gmail.com> wrote: >>>> >> > If I add additional small box to the cluster can I configure yarn >>>> to >>>> >> > select >>>> >> > small box to run am container? >>>> >> > >>>> >> > >>>> >> > On Mon, Feb 8, 2016 at 10:53 PM, Sean Owen <so...@cloudera.com> >>>> wrote: >>>> >> >> >>>> >> >> Typically YARN is there because you're mediating resource requests >>>> >> >> from things besides Spark, so yeah using every bit of the cluster >>>> is a >>>> >> >> little bit of a corner case. There's not a good answer if all your >>>> >> >> nodes are the same size. >>>> >> >> >>>> >> >> I think you can let YARN over-commit RAM though, and allocate more >>>> >> >> memory than it actually has. It may be beneficial to let them all >>>> >> >> think they have an extra GB, and let one node running the AM >>>> >> >> technically be overcommitted, a state which won't hurt at all >>>> unless >>>> >> >> you're really really tight on memory, in which case something >>>> might >>>> >> >> get killed. >>>> >> >> >>>> >> >> On Tue, Feb 9, 2016 at 6:49 AM, Jonathan Kelly < >>>> jonathaka...@gmail.com> >>>> >> >> wrote: >>>> >> >> > Alex, >>>> >> >> > >>>> >> >> > That's a very good question that I've been trying to answer >>>> myself >>>> >> >> > recently >>>> >> >> > too. Since you've mentioned before that you're using EMR, I >>>> assume >>>> >> >> > you're >>>> >> >> > asking this because you've noticed this behavior on emr-4.3.0. >>>> >> >> > >>>> >> >> > In this release, we made some changes to the >>>> >> >> > maximizeResourceAllocation >>>> >> >> > (which you may or may not be using, but either way this issue is >>>> >> >> > present), >>>> >> >> > including the accidental inclusion of somewhat of a bug that >>>> makes it >>>> >> >> > not >>>> >> >> > reserve any space for the AM, which ultimately results in one >>>> of the >>>> >> >> > nodes >>>> >> >> > being utilized only by the AM and not an executor. >>>> >> >> > >>>> >> >> > However, as you point out, the only viable fix seems to be to >>>> reserve >>>> >> >> > enough >>>> >> >> > memory for the AM on *every single node*, which in some cases >>>> might >>>> >> >> > actually >>>> >> >> > be worse than wasting a lot of memory on a single node. >>>> >> >> > >>>> >> >> > So yeah, I also don't like either option. Is this just the >>>> price you >>>> >> >> > pay >>>> >> >> > for >>>> >> >> > running on YARN? >>>> >> >> > >>>> >> >> > >>>> >> >> > ~ Jonathan >>>> >> >> > >>>> >> >> > On Mon, Feb 8, 2016 at 9:03 PM Alexander Pivovarov >>>> >> >> > <apivova...@gmail.com> >>>> >> >> > wrote: >>>> >> >> >> >>>> >> >> >> Lets say that yarn has 53GB memory available on each slave >>>> >> >> >> >>>> >> >> >> spark.am container needs 896MB. (512 + 384) >>>> >> >> >> >>>> >> >> >> I see two options to configure spark: >>>> >> >> >> >>>> >> >> >> 1. configure spark executors to use 52GB and leave 1 GB on >>>> each box. >>>> >> >> >> So, >>>> >> >> >> some box will also run am container. So, 1GB memory will not >>>> be used >>>> >> >> >> on >>>> >> >> >> all >>>> >> >> >> slaves but one. >>>> >> >> >> >>>> >> >> >> 2. configure spark to use all 53GB and add additional 53GB box >>>> which >>>> >> >> >> will >>>> >> >> >> run only am container. So, 52GB on this additional box will do >>>> >> >> >> nothing >>>> >> >> >> >>>> >> >> >> I do not like both options. Is there a better way to configure >>>> >> >> >> yarn/spark? >>>> >> >> >> >>>> >> >> >> >>>> >> >> >> Alex >>>> >> > >>>> >> > >>>> >>>> >>>> >>>> -- >>>> Marcelo >>>> >>> >>