Hi All

I am trying to run a program for a large dataset (~ 1TB). I have already
tested the code for low size of data and it works fine. But what I noticed
is that he job fails if the size of input is large. It was giving me errors
regarding akkka actor disassociation which I fixed by increasing the
timeouts.
But now I am getting errors like "execuyor lost" and "executor lost
failure" which I can't seem to figure out. These are my current set of
configs:

--conf "spark.network.timeout=30000"
--conf "spark.core.connection.ack.wait.timeout=30000"
--conf "spark.akka.timeout=30000"
--conf "spark.akka.askTimeout=30000"
--conf "spark.akka.frameSize=1000"
--conf "spark.storage.blockManagerSlaveTimeoutMs=600000"
--conf "spark.network.timeout=600"
--conf "spark.shuffle.memoryFraction=0.8"
--conf "spark.driver.maxResultSize=16g"
--conf "spark.driver.cores=10"
--conf "spark.driver.memory=10g"

Can anyone tell me any more configs to circumvent this "executor lost" and
"executor lost failure" error?

-- 
Thank You

Regards

Punit Naik

Reply via email to