Can someone from Databricks test and commit this PR? This is not a complete
solution, but would provide some relief.
https://github.com/apache/spark/pull/1391

Thanks,
Nishkam


On Wed, Aug 20, 2014 at 12:39 AM, Sandy Ryza <sandy.r...@cloudera.com>
wrote:

> Hi Debasish,
>
> The fix is to raise spark.yarn.executor.memoryOverhead until this goes
> away.  This controls the buffer between the JVM heap size and the amount of
> memory requested from YARN (JVMs can take up memory beyond their heap
> size). You should also make sure that, in the YARN NodeManager
> configuration, yarn.nodemanager.vmem-check-enabled is set to false.
>
> -Sandy
>
>
> On Wed, Aug 20, 2014 at 12:27 AM, Debasish Das <debasish.da...@gmail.com>
> wrote:
>
> > I could reproduce the issue in both 1.0 and 1.1 using YARN...so this is
> > definitely a YARN related problem...
> >
> > At least for me right now only deployment option possible is
> standalone...
> >
> >
> >
> > On Tue, Aug 19, 2014 at 11:29 PM, Xiangrui Meng <men...@gmail.com>
> wrote:
> >
> >> Hi Deb,
> >>
> >> I think this may be the same issue as described in
> >> https://issues.apache.org/jira/browse/SPARK-2121 . We know that the
> >> container got killed by YARN because it used much more memory that it
> >> requested. But we haven't figured out the root cause yet.
> >>
> >> +Sandy
> >>
> >> Best,
> >> Xiangrui
> >>
> >> On Tue, Aug 19, 2014 at 8:51 PM, Debasish Das <debasish.da...@gmail.com
> >
> >> wrote:
> >> > Hi,
> >> >
> >> > During the 4th ALS iteration, I am noticing that one of the executor
> >> gets
> >> > disconnected:
> >> >
> >> > 14/08/19 23:40:00 ERROR network.ConnectionManager: Corresponding
> >> > SendingConnectionManagerId not found
> >> >
> >> > 14/08/19 23:40:00 INFO cluster.YarnClientSchedulerBackend: Executor 5
> >> > disconnected, so removing it
> >> >
> >> > 14/08/19 23:40:00 ERROR cluster.YarnClientClusterScheduler: Lost
> >> executor 5
> >> > on tblpmidn42adv-hdp.tdc.vzwcorp.com: remote Akka client
> disassociated
> >> >
> >> > 14/08/19 23:40:00 INFO scheduler.DAGScheduler: Executor lost: 5 (epoch
> >> 12)
> >> > Any idea if this is a bug related to akka on YARN ?
> >> >
> >> > I am using master
> >> >
> >> > Thanks.
> >> > Deb
> >>
> >
> >
>

Reply via email to