You should check your executor log to identify the reason. My guess is that the executor is dead due to OOM.
If it is the reason, then you need to tune your executor memory setting, or more important, your partitions count, to make sure you have enough memory to handle correct size of partition data. Yong ________________________________ From: Punit Naik <naik.puni...@gmail.com> Sent: Monday, October 3, 2016 8:07 PM To: user Subject: Executor Lost error Hi All I am trying to run a program for a large dataset (~ 1TB). I have already tested the code for low size of data and it works fine. But what I noticed is that he job fails if the size of input is large. It was giving me errors regarding akkka actor disassociation which I fixed by increasing the timeouts. But now I am getting errors like "execuyor lost" and "executor lost failure" which I can't seem to figure out. These are my current set of configs: --conf "spark.network.timeout=30000" --conf "spark.core.connection.ack.wait.timeout=30000" --conf "spark.akka.timeout=30000" --conf "spark.akka.askTimeout=30000" --conf "spark.akka.frameSize=1000" --conf "spark.storage.blockManagerSlaveTimeoutMs=600000" --conf "spark.network.timeout=600" --conf "spark.shuffle.memoryFraction=0.8" --conf "spark.driver.maxResultSize=16g" --conf "spark.driver.cores=10" --conf "spark.driver.memory=10g" Can anyone tell me any more configs to circumvent this "executor lost" and "executor lost failure" error? -- Thank You Regards Punit Naik