Hi Vinoth, As per documentation DirectParquetOutputCommitter better suits for S3.
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/DirectParquetOutputCommitter.scala Regards, Surendra M -- Surendra Manchikanti On Fri, Mar 25, 2016 at 4:03 AM, Vinoth Chandar <vin...@uber.com> wrote: > Hi, > > We are doing the following to save a dataframe in parquet (using > DirectParquetOutputCommitter) as follows. > > dfWriter.format("parquet") > .mode(SaveMode.Overwrite) > .save(outputPath) > > The problem is even if an executor fails once while writing file (say some > transient HDFS issue), when its re-spawn, it fails again because the file > exists already, eventually failing the entire job. > > Is this a known issue? Any workarounds? > > Thanks > Vinoth >