Hi,

We are doing the following to save a dataframe in parquet (using
DirectParquetOutputCommitter) as follows.

dfWriter.format("parquet")
  .mode(SaveMode.Overwrite)
  .save(outputPath)

The problem is even if an executor fails once while writing file (say some
transient HDFS issue), when its re-spawn, it fails again because the file
exists already, eventually failing the entire job.

Is this a known issue? Any workarounds?

Thanks
Vinoth

Reply via email to