I mean Jonathan On Tue, Feb 9, 2016 at 10:41 AM, Alexander Pivovarov <apivova...@gmail.com> wrote:
> I decided to do YARN over-commit and add 896 > to yarn.nodemanager.resource.memory-mb > it was 54,272 > now I set it to 54,272+896 = 55,168 > > Kelly, can I ask you couple questions > 1. it is possible to add yarn label to particular instance group boxes on > EMR? > 2. in addition to maximizeResourceAllocation it would be nice if we have > executorsPerBox setting in EMR. > I have a case when I need to run 2 or 4 executors on r3.2xlarge > > On Tue, Feb 9, 2016 at 9:56 AM, Alexander Pivovarov <apivova...@gmail.com> > wrote: > >> I use hadoop 2.7.1 >> On Feb 9, 2016 9:54 AM, "Marcelo Vanzin" <van...@cloudera.com> wrote: >> >>> You should be able to use spark.yarn.am.nodeLabelExpression if your >>> version of YARN supports node labels (and you've added a label to the >>> node where you want the AM to run). >>> >>> On Tue, Feb 9, 2016 at 9:51 AM, Alexander Pivovarov >>> <apivova...@gmail.com> wrote: >>> > Am container starts first and yarn selects random computer to run it. >>> > >>> > Is it possible to configure yarn so that it selects small computer for >>> am >>> > container. >>> > >>> > On Feb 9, 2016 12:40 AM, "Sean Owen" <so...@cloudera.com> wrote: >>> >> >>> >> If it's too small to run an executor, I'd think it would be chosen for >>> >> the AM as the only way to satisfy the request. >>> >> >>> >> On Tue, Feb 9, 2016 at 8:35 AM, Alexander Pivovarov >>> >> <apivova...@gmail.com> wrote: >>> >> > If I add additional small box to the cluster can I configure yarn to >>> >> > select >>> >> > small box to run am container? >>> >> > >>> >> > >>> >> > On Mon, Feb 8, 2016 at 10:53 PM, Sean Owen <so...@cloudera.com> >>> wrote: >>> >> >> >>> >> >> Typically YARN is there because you're mediating resource requests >>> >> >> from things besides Spark, so yeah using every bit of the cluster >>> is a >>> >> >> little bit of a corner case. There's not a good answer if all your >>> >> >> nodes are the same size. >>> >> >> >>> >> >> I think you can let YARN over-commit RAM though, and allocate more >>> >> >> memory than it actually has. It may be beneficial to let them all >>> >> >> think they have an extra GB, and let one node running the AM >>> >> >> technically be overcommitted, a state which won't hurt at all >>> unless >>> >> >> you're really really tight on memory, in which case something might >>> >> >> get killed. >>> >> >> >>> >> >> On Tue, Feb 9, 2016 at 6:49 AM, Jonathan Kelly < >>> jonathaka...@gmail.com> >>> >> >> wrote: >>> >> >> > Alex, >>> >> >> > >>> >> >> > That's a very good question that I've been trying to answer >>> myself >>> >> >> > recently >>> >> >> > too. Since you've mentioned before that you're using EMR, I >>> assume >>> >> >> > you're >>> >> >> > asking this because you've noticed this behavior on emr-4.3.0. >>> >> >> > >>> >> >> > In this release, we made some changes to the >>> >> >> > maximizeResourceAllocation >>> >> >> > (which you may or may not be using, but either way this issue is >>> >> >> > present), >>> >> >> > including the accidental inclusion of somewhat of a bug that >>> makes it >>> >> >> > not >>> >> >> > reserve any space for the AM, which ultimately results in one of >>> the >>> >> >> > nodes >>> >> >> > being utilized only by the AM and not an executor. >>> >> >> > >>> >> >> > However, as you point out, the only viable fix seems to be to >>> reserve >>> >> >> > enough >>> >> >> > memory for the AM on *every single node*, which in some cases >>> might >>> >> >> > actually >>> >> >> > be worse than wasting a lot of memory on a single node. >>> >> >> > >>> >> >> > So yeah, I also don't like either option. Is this just the price >>> you >>> >> >> > pay >>> >> >> > for >>> >> >> > running on YARN? >>> >> >> > >>> >> >> > >>> >> >> > ~ Jonathan >>> >> >> > >>> >> >> > On Mon, Feb 8, 2016 at 9:03 PM Alexander Pivovarov >>> >> >> > <apivova...@gmail.com> >>> >> >> > wrote: >>> >> >> >> >>> >> >> >> Lets say that yarn has 53GB memory available on each slave >>> >> >> >> >>> >> >> >> spark.am container needs 896MB. (512 + 384) >>> >> >> >> >>> >> >> >> I see two options to configure spark: >>> >> >> >> >>> >> >> >> 1. configure spark executors to use 52GB and leave 1 GB on each >>> box. >>> >> >> >> So, >>> >> >> >> some box will also run am container. So, 1GB memory will not be >>> used >>> >> >> >> on >>> >> >> >> all >>> >> >> >> slaves but one. >>> >> >> >> >>> >> >> >> 2. configure spark to use all 53GB and add additional 53GB box >>> which >>> >> >> >> will >>> >> >> >> run only am container. So, 52GB on this additional box will do >>> >> >> >> nothing >>> >> >> >> >>> >> >> >> I do not like both options. Is there a better way to configure >>> >> >> >> yarn/spark? >>> >> >> >> >>> >> >> >> >>> >> >> >> Alex >>> >> > >>> >> > >>> >>> >>> >>> -- >>> Marcelo >>> >> >