Re: Lost executor on YARN ALS iterations

Debasish Das Tue, 09 Sep 2014 14:05:36 -0700

Last time it did not show up on environment tab but I will give it another
shot...Expected behavior is that this env variable will show up right ?


On Tue, Sep 9, 2014 at 12:15 PM, Sandy Ryza <sandy.r...@cloudera.com> wrote:

> I would expect 2 GB would be enough or more than enough for 16 GB
> executors (unless ALS is using a bunch of off-heap memory?).  You mentioned
> earlier in this thread that the property wasn't showing up in the
> Environment tab.  Are you sure it's making it in?
>
> -Sandy
>
> On Tue, Sep 9, 2014 at 11:58 AM, Debasish Das <debasish.da...@gmail.com>
> wrote:
>
>> Hmm...I did try it increase to few gb but did not get a successful run
>> yet...
>>
>> Any idea if I am using say 40 executors, each running 16GB, what's the
>> typical spark.yarn.executor.memoryOverhead for say 100M x 10 M large
>> matrices with say few billion ratings...
>>
>> On Tue, Sep 9, 2014 at 10:49 AM, Sandy Ryza <sandy.r...@cloudera.com>
>> wrote:
>>
>>> Hi Deb,
>>>
>>> The current state of the art is to increase
>>> spark.yarn.executor.memoryOverhead until the job stops failing.  We do have
>>> plans to try to automatically scale this based on the amount of memory
>>> requested, but it will still just be a heuristic.
>>>
>>> -Sandy
>>>
>>> On Tue, Sep 9, 2014 at 7:32 AM, Debasish Das <debasish.da...@gmail.com>
>>> wrote:
>>>
>>>> Hi Sandy,
>>>>
>>>> Any resolution for YARN failures ? It's a blocker for running spark on
>>>> top of YARN.
>>>>
>>>> Thanks.
>>>> Deb
>>>>
>>>> On Tue, Aug 19, 2014 at 11:29 PM, Xiangrui Meng <men...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Deb,
>>>>>
>>>>> I think this may be the same issue as described in
>>>>> https://issues.apache.org/jira/browse/SPARK-2121 . We know that the
>>>>> container got killed by YARN because it used much more memory that it
>>>>> requested. But we haven't figured out the root cause yet.
>>>>>
>>>>> +Sandy
>>>>>
>>>>> Best,
>>>>> Xiangrui
>>>>>
>>>>> On Tue, Aug 19, 2014 at 8:51 PM, Debasish Das <
>>>>> debasish.da...@gmail.com> wrote:
>>>>> > Hi,
>>>>> >
>>>>> > During the 4th ALS iteration, I am noticing that one of the executor
>>>>> gets
>>>>> > disconnected:
>>>>> >
>>>>> > 14/08/19 23:40:00 ERROR network.ConnectionManager: Corresponding
>>>>> > SendingConnectionManagerId not found
>>>>> >
>>>>> > 14/08/19 23:40:00 INFO cluster.YarnClientSchedulerBackend: Executor 5
>>>>> > disconnected, so removing it
>>>>> >
>>>>> > 14/08/19 23:40:00 ERROR cluster.YarnClientClusterScheduler: Lost
>>>>> executor 5
>>>>> > on tblpmidn42adv-hdp.tdc.vzwcorp.com: remote Akka client
>>>>> disassociated
>>>>> >
>>>>> > 14/08/19 23:40:00 INFO scheduler.DAGScheduler: Executor lost: 5
>>>>> (epoch 12)
>>>>> > Any idea if this is a bug related to akka on YARN ?
>>>>> >
>>>>> > I am using master
>>>>> >
>>>>> > Thanks.
>>>>> > Deb
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Lost executor on YARN ALS iterations

Reply via email to