Hi Everyone,

We are using Spark 3.0.1 with Kubernetes resource manager. Facing an
intermittent issue in which the driver pod gets deleted and the driver logs
have this message that Spark Context was shutdown.

The same job works fine with given set of configurations most of the time
but sometimes it fails. It mostly occurs while reading or writing parquet
files to hdfs. (but not sure if it's the only usecase affected)

Any pointers to find the root cause?

Most of the earlier reported issues mention executors getting OOM as the
cause. But we have not seen an OOM error in any of executors. Also, why the
context will be shutdown in this case instead of retrying with new
executors.
Another doubt is why the driver pod gets deleted. Shouldn't it just error
out?

Regards,
Shrikant

-- 
Regards,
Shrikant Prasad

Reply via email to