based on spark UI I am running 25 executors for sure. why would you expect 
four? I submit the task with --num-executors 25 and I get 6-7 executors running 
per host (using more of smaller executors allows me better cluster utilization 
when running parallel spark sessions (which is not the case of this reported 
issue - for now using the cluster exclusively)).
thx,Antony. 

     On Thursday, 19 February 2015, 11:02, Sean Owen <so...@cloudera.com> wrote:
   
 

 This should result in 4 executors, not 25. They should be able to
execute 4*4 = 16 tasks simultaneously. You have them grab 4*32 = 128GB
of RAM, not 1TB.

It still feels like this shouldn't be running out of memory, not by a
long shot though. But just pointing out potential differences between
what you are expecting and what you are configuring.

On Thu, Feb 19, 2015 at 9:56 AM, Antony Mayi
<antonym...@yahoo.com.invalid> wrote:
> Hi,
>
> I have 4 very powerful boxes (256GB RAM, 32 cores each). I am running spark
> 1.2.0 in yarn-client mode with following layout:
>
> spark.executor.cores=4
> spark.executor.memory=28G
> spark.yarn.executor.memoryOverhead=4096
>
> I am submitting bigger ALS trainImplicit task (rank=100, iters=15) on a
> dataset with ~3 billion of ratings using 25 executors. At some point some
> executor crashes with:
>
> 15/02/19 05:41:06 WARN util.AkkaUtils: Error sending message in 1 attempts
> java.util.concurrent.TimeoutException: Futures timed out after [30 seconds]
>        at
> scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
>        at
> scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
>        at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
>        at
> scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
>        at scala.concurrent.Await$.result(package.scala:107)
>        at
> org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:187)
>        at
> org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:398)
> 15/02/19 05:41:06 ERROR executor.Executor: Exception in task 131.0 in stage
> 51.0 (TID 7259)
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>        at java.lang.reflect.Array.newInstance(Array.java:75)
>        at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1671)
>        at
> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1345)
>        at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1707)
>        at
> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1345)
>
> So the GC overhead limit exceeded is pretty clear and would suggest running
> out of memory. Since I have 1TB of RAM available this must be rather due to
> some config inoptimality.
>
> Can anyone please point me to some directions how to tackle this?
>
> Thanks,
> Antony.


 
   

Reply via email to