I use hadoop 2.7.1 On Feb 9, 2016 9:54 AM, "Marcelo Vanzin" <van...@cloudera.com> wrote:
> You should be able to use spark.yarn.am.nodeLabelExpression if your > version of YARN supports node labels (and you've added a label to the > node where you want the AM to run). > > On Tue, Feb 9, 2016 at 9:51 AM, Alexander Pivovarov > <apivova...@gmail.com> wrote: > > Am container starts first and yarn selects random computer to run it. > > > > Is it possible to configure yarn so that it selects small computer for am > > container. > > > > On Feb 9, 2016 12:40 AM, "Sean Owen" <so...@cloudera.com> wrote: > >> > >> If it's too small to run an executor, I'd think it would be chosen for > >> the AM as the only way to satisfy the request. > >> > >> On Tue, Feb 9, 2016 at 8:35 AM, Alexander Pivovarov > >> <apivova...@gmail.com> wrote: > >> > If I add additional small box to the cluster can I configure yarn to > >> > select > >> > small box to run am container? > >> > > >> > > >> > On Mon, Feb 8, 2016 at 10:53 PM, Sean Owen <so...@cloudera.com> > wrote: > >> >> > >> >> Typically YARN is there because you're mediating resource requests > >> >> from things besides Spark, so yeah using every bit of the cluster is > a > >> >> little bit of a corner case. There's not a good answer if all your > >> >> nodes are the same size. > >> >> > >> >> I think you can let YARN over-commit RAM though, and allocate more > >> >> memory than it actually has. It may be beneficial to let them all > >> >> think they have an extra GB, and let one node running the AM > >> >> technically be overcommitted, a state which won't hurt at all unless > >> >> you're really really tight on memory, in which case something might > >> >> get killed. > >> >> > >> >> On Tue, Feb 9, 2016 at 6:49 AM, Jonathan Kelly < > jonathaka...@gmail.com> > >> >> wrote: > >> >> > Alex, > >> >> > > >> >> > That's a very good question that I've been trying to answer myself > >> >> > recently > >> >> > too. Since you've mentioned before that you're using EMR, I assume > >> >> > you're > >> >> > asking this because you've noticed this behavior on emr-4.3.0. > >> >> > > >> >> > In this release, we made some changes to the > >> >> > maximizeResourceAllocation > >> >> > (which you may or may not be using, but either way this issue is > >> >> > present), > >> >> > including the accidental inclusion of somewhat of a bug that makes > it > >> >> > not > >> >> > reserve any space for the AM, which ultimately results in one of > the > >> >> > nodes > >> >> > being utilized only by the AM and not an executor. > >> >> > > >> >> > However, as you point out, the only viable fix seems to be to > reserve > >> >> > enough > >> >> > memory for the AM on *every single node*, which in some cases might > >> >> > actually > >> >> > be worse than wasting a lot of memory on a single node. > >> >> > > >> >> > So yeah, I also don't like either option. Is this just the price > you > >> >> > pay > >> >> > for > >> >> > running on YARN? > >> >> > > >> >> > > >> >> > ~ Jonathan > >> >> > > >> >> > On Mon, Feb 8, 2016 at 9:03 PM Alexander Pivovarov > >> >> > <apivova...@gmail.com> > >> >> > wrote: > >> >> >> > >> >> >> Lets say that yarn has 53GB memory available on each slave > >> >> >> > >> >> >> spark.am container needs 896MB. (512 + 384) > >> >> >> > >> >> >> I see two options to configure spark: > >> >> >> > >> >> >> 1. configure spark executors to use 52GB and leave 1 GB on each > box. > >> >> >> So, > >> >> >> some box will also run am container. So, 1GB memory will not be > used > >> >> >> on > >> >> >> all > >> >> >> slaves but one. > >> >> >> > >> >> >> 2. configure spark to use all 53GB and add additional 53GB box > which > >> >> >> will > >> >> >> run only am container. So, 52GB on this additional box will do > >> >> >> nothing > >> >> >> > >> >> >> I do not like both options. Is there a better way to configure > >> >> >> yarn/spark? > >> >> >> > >> >> >> > >> >> >> Alex > >> > > >> > > > > > -- > Marcelo >