Hi Vinoth,

As per documentation DirectParquetOutputCommitter better suits for S3.

https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/DirectParquetOutputCommitter.scala

Regards,
Surendra M

-- Surendra Manchikanti

On Fri, Mar 25, 2016 at 4:03 AM, Vinoth Chandar <vin...@uber.com> wrote:

> Hi,
>
> We are doing the following to save a dataframe in parquet (using
> DirectParquetOutputCommitter) as follows.
>
> dfWriter.format("parquet")
>   .mode(SaveMode.Overwrite)
>   .save(outputPath)
>
> The problem is even if an executor fails once while writing file (say some
> transient HDFS issue), when its re-spawn, it fails again because the file
> exists already, eventually failing the entire job.
>
> Is this a known issue? Any workarounds?
>
> Thanks
> Vinoth
>

Reply via email to