Hi All I am trying to run a program for a large dataset (~ 1TB). I have already tested the code for low size of data and it works fine. But what I noticed is that he job fails if the size of input is large. It was giving me errors regarding akkka actor disassociation which I fixed by increasing the timeouts. But now I am getting errors like "execuyor lost" and "executor lost failure" which I can't seem to figure out. These are my current set of configs:
--conf "spark.network.timeout=30000" --conf "spark.core.connection.ack.wait.timeout=30000" --conf "spark.akka.timeout=30000" --conf "spark.akka.askTimeout=30000" --conf "spark.akka.frameSize=1000" --conf "spark.storage.blockManagerSlaveTimeoutMs=600000" --conf "spark.network.timeout=600" --conf "spark.shuffle.memoryFraction=0.8" --conf "spark.driver.maxResultSize=16g" --conf "spark.driver.cores=10" --conf "spark.driver.memory=10g" Can anyone tell me any more configs to circumvent this "executor lost" and "executor lost failure" error? -- Thank You Regards Punit Naik