Re: Does dataframe spark API write/create a single file instead of directory as a result of write operation.

2020-02-22 Thread rahul c
Hi Kshitij, There are option to suppress the metadata files from get created. Set the below properties and try. 1) To disable the transaction logs of spark "spark.sql.sources.commitProtocolClass = org.apache.spark.sql.execution.datasources.SQLHadoopMapReduceCommitProtocol". This will help to disa

Re: Does dataframe spark API write/create a single file instead of directory as a result of write operation.

2020-02-22 Thread Kshitij
Is there any way to save it as raw_csv file as we do in pandas? I have a script that uses the CSV file for further processing. On Sat, 22 Feb 2020 at 14:31, rahul c wrote: > Hi Kshitij, > > There are option to suppress the metadata files from get created. > Set the below properties and try. > >

Re: Does dataframe spark API write/create a single file instead of directory as a result of write operation.

2020-02-22 Thread Kshitij
That's the alternative ofcourse. But that is costly when we are dealing with bunch of files. Thanks. On Sat, Feb 22, 2020, 4:15 PM Sebastian Piu wrote: > I'm not aware of a way to specify the file name on the writer. > Since you'd need to bring all the data into a single node and write from > t

Re: Does dataframe spark API write/create a single file instead of directory as a result of write operation.

2020-02-22 Thread rahul c
Hi, df.write.csv() Will ideally give you a csv file which can be used in further processing. I am not that much aware of raw_csv function of pandas. On Sat, 22 Feb, 2020, 4:09 PM Kshitij, wrote: > Is there any way to save it as raw_csv file as we do in pandas? I have a > script that uses the CS

Re: Does dataframe spark API write/create a single file instead of directory as a result of write operation.

2020-02-22 Thread Kshitij
I am talking about spark here. On Sat, Feb 22, 2020, 4:19 PM rahul c wrote: > Hi, > > df.write.csv() > Will ideally give you a csv file which can be used in further processing. > I am not that much aware of raw_csv function of pandas. > > On Sat, 22 Feb, 2020, 4:09 PM Kshitij, wrote: > >> Is th

Re: Does dataframe spark API write/create a single file instead of directory as a result of write operation.

2020-02-22 Thread JARDIN Yohann
How costly is it for you, to move files after generating them with Spark? File systems tend to just update some links under the hood. *Yohann Jardin* Le 2/22/2020 à 11:47 AM, Kshitij a écrit : That's the alternative ofcourse. But that is costly when we are dealing with bunch of files. Thanks.