You should check your executor log to identify the reason. My guess is that the 
executor is dead due to OOM.


If it is the reason, then you need to tune your executor memory setting, or 
more important, your partitions count, to make sure you have enough memory to 
handle correct size of partition data.


Yong


________________________________
From: Punit Naik <naik.puni...@gmail.com>
Sent: Monday, October 3, 2016 8:07 PM
To: user
Subject: Executor Lost error

Hi All

I am trying to run a program for a large dataset (~ 1TB). I have already tested 
the code for low size of data and it works fine. But what I noticed is that he 
job fails if the size of input is large. It was giving me errors regarding 
akkka actor disassociation which I fixed by increasing the timeouts.
But now I am getting errors like "execuyor lost" and "executor lost failure" 
which I can't seem to figure out. These are my current set of configs:

--conf "spark.network.timeout=30000"
--conf "spark.core.connection.ack.wait.timeout=30000"
--conf "spark.akka.timeout=30000"
--conf "spark.akka.askTimeout=30000"
--conf "spark.akka.frameSize=1000"
--conf "spark.storage.blockManagerSlaveTimeoutMs=600000"
--conf "spark.network.timeout=600"
--conf "spark.shuffle.memoryFraction=0.8"
--conf "spark.driver.maxResultSize=16g"
--conf "spark.driver.cores=10"
--conf "spark.driver.memory=10g"

Can anyone tell me any more configs to circumvent this "executor lost" and 
"executor lost failure" error?

--
Thank You

Regards

Punit Naik

Reply via email to