Re: DataFrameWriter.save fails job with one executor failure

2016-03-27 Thread Vinoth Chandar
Thanks guys. But the issue seems orthogonal to what output committer is used, no? When writing out a dataframe as parquet, does the job recover if one task crashes mid-way, leaving a half written file? What we observe is that when the task is re-tried, it tries to open a "new" file of the same nam

Re: DataFrameWriter.save fails job with one executor failure

2016-03-25 Thread Surendra , Manchikanti
Hi Vinoth, As per documentation DirectParquetOutputCommitter better suits for S3. https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/DirectParquetOutputCommitter.scala Regards, Surendra M -- Surendra Manchikanti On Fri, Mar 25

Re: DataFrameWriter.save fails job with one executor failure

2016-03-25 Thread Michael Armbrust
I would not recommend using the direct output committer with HDFS. Its intended only as an optimization for S3. On Fri, Mar 25, 2016 at 4:03 AM, Vinoth Chandar wrote: > Hi, > > We are doing the following to save a dataframe in parquet (using > DirectParquetOutputCommitter) as follows. > > dfWri