Re: Does dataframe spark API write/create a single file instead of directory as a result of write operation.

rahul c Sat, 22 Feb 2020 01:01:51 -0800

Hi Kshitij,

There are option to suppress the metadata files from get created.
Set the below properties and try.

1) To disable the transaction logs of spark
"spark.sql.sources.commitProtocolClass =
org.apache.spark.sql.execution.datasources.SQLHadoopMapReduceCommitProtocol".
This will help to disable the "committed<TID>" and "started<TID>" files but
still _SUCCESS, _common_metadata and _metadata files will generate.

2) We can disable the _common_metadata and _metadata files using
"parquet.enable.summary-metadata=false".

3) We can also disable the _SUCCESS file using
"mapreduce.fileoutputcommitter.marksuccessfuljobs=false".

On Sat, 22 Feb, 2020, 10:51 AM Kshitij, <kshtjkm...@gmail.com> wrote:

> Hi,
>
> There is no dataframe spark API which writes/creates a single file instead
> of directory as a result of write operation.
>
> Below both options will create directory with a random file name.
>
> df.coalesce(1).write.csv(<path>)
>
>
>
> df.write.csv(<path>)
>
>
> Instead of creating directory with standard files (_SUCCESS , _committed ,
> _started). I want a single file with file_name specified.
>
>
> Thanks
>

Re: Does dataframe spark API write/create a single file instead of directory as a result of write operation.

Reply via email to