Re: spark on yarn wastes one box (or 1 GB on each box) for am container

Marcelo Vanzin Tue, 09 Feb 2016 09:55:29 -0800

You should be able to use spark.yarn.am.nodeLabelExpression if your
version of YARN supports node labels (and you've added a label to the
node where you want the AM to run).


On Tue, Feb 9, 2016 at 9:51 AM, Alexander Pivovarov
<[email protected]> wrote:
> Am container starts first and yarn selects random computer to run it.
>
> Is it possible to configure yarn so that it selects small computer for am
> container.
>
> On Feb 9, 2016 12:40 AM, "Sean Owen" <[email protected]> wrote:
>>
>> If it's too small to run an executor, I'd think it would be chosen for
>> the AM as the only way to satisfy the request.
>>
>> On Tue, Feb 9, 2016 at 8:35 AM, Alexander Pivovarov
>> <[email protected]> wrote:
>> > If I add additional small box to the cluster can I configure yarn to
>> > select
>> > small box to run am container?
>> >
>> >
>> > On Mon, Feb 8, 2016 at 10:53 PM, Sean Owen <[email protected]> wrote:
>> >>
>> >> Typically YARN is there because you're mediating resource requests
>> >> from things besides Spark, so yeah using every bit of the cluster is a
>> >> little bit of a corner case. There's not a good answer if all your
>> >> nodes are the same size.
>> >>
>> >> I think you can let YARN over-commit RAM though, and allocate more
>> >> memory than it actually has. It may be beneficial to let them all
>> >> think they have an extra GB, and let one node running the AM
>> >> technically be overcommitted, a state which won't hurt at all unless
>> >> you're really really tight on memory, in which case something might
>> >> get killed.
>> >>
>> >> On Tue, Feb 9, 2016 at 6:49 AM, Jonathan Kelly <[email protected]>
>> >> wrote:
>> >> > Alex,
>> >> >
>> >> > That's a very good question that I've been trying to answer myself
>> >> > recently
>> >> > too. Since you've mentioned before that you're using EMR, I assume
>> >> > you're
>> >> > asking this because you've noticed this behavior on emr-4.3.0.
>> >> >
>> >> > In this release, we made some changes to the
>> >> > maximizeResourceAllocation
>> >> > (which you may or may not be using, but either way this issue is
>> >> > present),
>> >> > including the accidental inclusion of somewhat of a bug that makes it
>> >> > not
>> >> > reserve any space for the AM, which ultimately results in one of the
>> >> > nodes
>> >> > being utilized only by the AM and not an executor.
>> >> >
>> >> > However, as you point out, the only viable fix seems to be to reserve
>> >> > enough
>> >> > memory for the AM on *every single node*, which in some cases might
>> >> > actually
>> >> > be worse than wasting a lot of memory on a single node.
>> >> >
>> >> > So yeah, I also don't like either option. Is this just the price you
>> >> > pay
>> >> > for
>> >> > running on YARN?
>> >> >
>> >> >
>> >> > ~ Jonathan
>> >> >
>> >> > On Mon, Feb 8, 2016 at 9:03 PM Alexander Pivovarov
>> >> > <[email protected]>
>> >> > wrote:
>> >> >>
>> >> >> Lets say that yarn has 53GB memory available on each slave
>> >> >>
>> >> >> spark.am container needs 896MB.  (512 + 384)
>> >> >>
>> >> >> I see two options to configure spark:
>> >> >>
>> >> >> 1. configure spark executors to use 52GB and leave 1 GB on each box.
>> >> >> So,
>> >> >> some box will also run am container. So, 1GB memory will not be used
>> >> >> on
>> >> >> all
>> >> >> slaves but one.
>> >> >>
>> >> >> 2. configure spark to use all 53GB and add additional 53GB box which
>> >> >> will
>> >> >> run only am container. So, 52GB on this additional box will do
>> >> >> nothing
>> >> >>
>> >> >> I do not like both options. Is there a better way to configure
>> >> >> yarn/spark?
>> >> >>
>> >> >>
>> >> >> Alex
>> >
>> >



-- 
Marcelo

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: spark on yarn wastes one box (or 1 GB on each box) for am container

Reply via email to