Re: Exception in task 0.0 in stage 13.0 (TID 13) java.lang.OutOfMemoryError: Java heap space

Pat Ferrel Fri, 12 Feb 2016 13:16:24 -0800

You have to set the executor memory. BTW you have given the driver all memory 
on the machine.


> On Feb 10, 2016, at 9:30 AM, Jaume Galí <[email protected]> wrote:
> 
> Hi again,
> (Sorry for my delay but we didn’t have machine to test your thoughts about 
> memory issue.)
> 
> The problem still happening testing with an input matrix of 100k rows by 300 
> items, I increase memory as you suggest but nothing changed. I attached 
> spark_env.sh and new specs of machine
> 
> Machine specs:
> 
> m3.xlarge AWS (Ivy Bridge, 15Gb ram, 2x40gb HD)
> 
> This is my spark-env.sh:
> 
>          #!/usr/bin/env bash
> # Licensed to ...
> 
> export SPARK_HOME=${SPARK_HOME:-/usr/lib/spark}
> export SPARK_LOG_DIR=${SPARK_LOG_DIR:-/var/log/spark}
> export HADOOP_HOME=${HADOOP_HOME:-/usr/lib/hadoop}
> export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-/etc/hadoop/conf}
> export HIVE_CONF_DIR=${HIVE_CONF_DIR:-/etc/hive/conf}
> 
> export STANDALONE_SPARK_MASTER_HOST=ip-10-12-17-235.eu 
> <http://ip-10-12-17-235.eu/>-west-1.compute.internal
> export SPARK_MASTER_PORT=7077
> export SPARK_MASTER_IP=$STANDALONE_SPARK_MASTER_HOST
> export SPARK_MASTER_WEBUI_PORT=8080
> 
> export SPARK_WORKER_DIR=${SPARK_WORKER_DIR:-/var/run/spark/work}
> export SPARK_WORKER_PORT=7078
> export SPARK_WORKER_WEBUI_PORT=8081
> 
> export HIVE_SERVER2_THRIFT_BIND_HOST=0.0.0.0
> export HIVE_SERVER2_THRIFT_PORT=10001
> 
> export SPARK_DRIVER_MEMORY=15G
> export SPARK_DAEMON_JAVA_OPTS="$SPARK_DAEMON_JAVA_OPTS 
> -XX:OnOutOfMemoryError='kill -9 %p’”
> 
> Log:
> 
> Exception in thread "main" org.apache.spark.SparkException: Job aborted due 
> to stage failure: Task 0 in stage 12.0 failed 1 times, most recent failure: 
> Lost task 0.0 in stage 12.0 (TID 24, localhost): java.lang.OutOfMemoryError: 
> GC overhead limit exceeded
> …….
> …..
> ..
> .
> 
> Driver stacktrace:
> Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
> …….
> …..
> ...
> ..
> .
> 
> 
> Thanks for advance
> 
>> El 2/2/2016, a las 7:48, Pat Ferrel <[email protected] 
>> <mailto:[email protected]>> escribió:
>> 
>> You probably need to increase your driver memory and 8g will not work. 16g 
>> is probably the smallest stand alone machine that will work since the driver 
>> and executors run on it.
>> 
>>> On Feb 1, 2016, at 1:24 AM, [email protected] <mailto:[email protected]> 
>>> wrote:
>>> 
>>> Hello everybody,
>>> 
>>> We are experimenting problems when we use "mahout spark-rowsimilarity” 
>>> operation. We have an input matrix with 100k rows and 100 items and process 
>>> throws an exception about “Exception in task 0.0 in stage 13.0 (TID 13) 
>>> java.lang.OutOfMemoryError: Java heap space” and we try to increase JAVA 
>>> HEAP MEMORY, MAHOUT HEAP MEMORY and spark.driver.memory. 
>>> 
>>> Environment versions:
>>> Mahout: 0.11.1
>>> Spark: 1.6.0.
>>> 
>>> Mahout command line:
>>>     /opt/mahout/bin/mahout spark-rowsimilarity -i 50k_rows__50items.dat -o 
>>> test_output.tmp --maxObservations 500 --maxSimilaritiesPerRow 100 
>>> --omitStrength --master local --sparkExecutorMem 8g
>>> 
>>> This process is running on a machine with following specifications:
>>> Mem RAM: 8gb 
>>> CPU with 8 cores
>>>     
>>> .profile file:
>>> export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
>>> export HADOOP_HOME=/opt/hadoop-2.6.0
>>> export SPARK_HOME=/opt/spark
>>> export MAHOUT_HOME=/opt/mahout
>>> export MAHOUT_HEAPSIZE=8192
>>> 
>>> Throws exception:
>>>     
>>> 16/01/22 11:45:06 ERROR Executor: Exception in task 0.0 in stage 13.0 (TID 
>>> 13)
>>> java.lang.OutOfMemoryError: Java heap space
>>>      at org.apache.mahout.math.DenseMatrix.<init>(DenseMatrix.java:66)
>>>      at 
>>> org.apache.mahout.sparkbindings.drm.package$$anonfun$blockify$1.apply(package.scala:70)
>>>      at 
>>> org.apache.mahout.sparkbindings.drm.package$$anonfun$blockify$1.apply(package.scala:59)
>>>      at 
>>> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
>>>      at 
>>> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
>>>      at 
>>> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>>>      at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>>>      at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>>>      at 
>>> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>>>      at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>>>      at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>>>      at 
>>> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>>>      at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>>>      at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>>>      at 
>>> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>>>      at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>>>      at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>>>      at 
>>> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>>>      at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>>>      at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>>>      at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>>>      at org.apache.spark.scheduler.Task.run(Task.scala:89)
>>>      at 
>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
>>>      at 
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>      at 
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>      at java.lang.Thread.run(Thread.java:745)
>>> 16/01/22 11:45:06 WARN NettyRpcEndpointRef: Error sending message [message 
>>> = Heartbeat(driver,[Lscala.Tuple2;@12498227,BlockManagerId(driver, 
>>> localhost, 42107))] in 1 attempts
>>> org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [120 
>>> seconds]. This timeout is controlled by spark.rpc.askTimeout
>>>      at 
>>> org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:48)
>>>      at 
>>> org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:63)
>>>      at 
>>> org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
>>>      at 
>>> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33)
>>>      at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:76)
>>>      at 
>>> org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:101)
>>>      at 
>>> org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:77)
>>>      at 
>>> org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$reportHeartBeat(Executor.scala:448)
>>>      at 
>>> org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply$mcV$sp(Executor.scala:468)
>>>      at 
>>> org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply(Executor.scala:468)
>>>      at 
>>> org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply(Executor.scala:468)
>>>      at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1741)
>>>      at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:468)
>>>      at 
>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>>>      at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
>>>      at 
>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
>>>      at 
>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>>>      at 
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>      at 
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>      at java.lang.Thread.run(Thread.java:745)
>>> 16/01/22 11:45:06 WARN NettyRpcEndpointRef: Error sending message [message 
>>> = Heartbeat(driver,[Lscala.Tuple2;@12498227,BlockManagerId(driver, 
>>> localhost, 42107))] in 1 attempts
>>> org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [120 
>>> seconds]. This timeout is controlled by spark.rpc.askTimeout
>>>      at 
>>> org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:48)
>>>      at 
>>> org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:63)
>>>      at 
>>> org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
>>>      at 
>>> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33)
>>>      at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:76)
>>>      at 
>>> org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:101)
>>>      at 
>>> org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:77)
>>>      at 
>>> org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$reportHeartBeat(Executor.scala:448)
>>>      at 
>>> org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply$mcV$sp(Executor.scala:468)
>>>      at 
>>> org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply(Executor.scala:468)
>>>      at 
>>> org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply(Executor.scala:468)
>>>      at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1741)
>>>      at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:468)
>>>      at 
>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>>>      at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
>>>      at 
>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
>>>      at 
>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>>>      at 
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>      at 
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>      at java.lang.Thread.run(Thread.java:745)
>>> Caused by: java.util.concurrent.TimeoutException: Futures timed out after 
>>> [120 seconds]
>>>      at 
>>> scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
>>>      at 
>>> scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
>>>      at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
>>>      at 
>>> scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
>>>      at scala.concurrent.Await$.result(package.scala:107)
>>>      at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
>>>      ...
>>> 
>>> Can you please advise?
>>> 
>>> 
>>> Thanks for advance.
>>> Cheers.
>> 
>

Re: Exception in task 0.0 in stage 13.0 (TID 13) java.lang.OutOfMemoryError: Java heap space

Reply via email to