Hi, Shrikant.

It seems that you are using non-GA features.

FYI, since Apache Spark 3.1.1, Kubernetes Support became GA in the
community.

    https://spark.apache.org/releases/spark-release-3-1-1.html

In addition, Apache Spark 3.1 reached EOL last month.

Could you try the latest distribution like Apache Spark 3.3.1 to see that
you are still experiencing the same issue?

It will reduce the scope of your issues by excluding many known and fixed
bugs at 3.0/3.1/3.2/3.3.0.

Thanks,
Dongjoon.


On Wed, Oct 26, 2022 at 11:16 PM Shrikant Prasad <shrikant....@gmail.com>
wrote:

> Hi Everyone,
>
> We are using Spark 3.0.1 with Kubernetes resource manager. Facing an
> intermittent issue in which the driver pod gets deleted and the driver logs
> have this message that Spark Context was shutdown.
>
> The same job works fine with given set of configurations most of the time
> but sometimes it fails. It mostly occurs while reading or writing parquet
> files to hdfs. (but not sure if it's the only usecase affected)
>
> Any pointers to find the root cause?
>
> Most of the earlier reported issues mention executors getting OOM as the
> cause. But we have not seen an OOM error in any of executors. Also, why the
> context will be shutdown in this case instead of retrying with new
> executors.
> Another doubt is why the driver pod gets deleted. Shouldn't it just error
> out?
>
> Regards,
> Shrikant
>
> --
> Regards,
> Shrikant Prasad
>

Reply via email to