Hi Kshitij,
There are option to suppress the metadata files from get created.
Set the below properties and try.
1) To disable the transaction logs of spark
"spark.sql.sources.commitProtocolClass =
org.apache.spark.sql.execution.datasources.SQLHadoopMapReduceCommitProtocol".
This will help to disa
Is there any way to save it as raw_csv file as we do in pandas? I have a
script that uses the CSV file for further processing.
On Sat, 22 Feb 2020 at 14:31, rahul c wrote:
> Hi Kshitij,
>
> There are option to suppress the metadata files from get created.
> Set the below properties and try.
>
>
That's the alternative ofcourse. But that is costly when we are dealing
with bunch of files.
Thanks.
On Sat, Feb 22, 2020, 4:15 PM Sebastian Piu wrote:
> I'm not aware of a way to specify the file name on the writer.
> Since you'd need to bring all the data into a single node and write from
> t
Hi,
df.write.csv()
Will ideally give you a csv file which can be used in further processing.
I am not that much aware of raw_csv function of pandas.
On Sat, 22 Feb, 2020, 4:09 PM Kshitij, wrote:
> Is there any way to save it as raw_csv file as we do in pandas? I have a
> script that uses the CS
I am talking about spark here.
On Sat, Feb 22, 2020, 4:19 PM rahul c wrote:
> Hi,
>
> df.write.csv()
> Will ideally give you a csv file which can be used in further processing.
> I am not that much aware of raw_csv function of pandas.
>
> On Sat, 22 Feb, 2020, 4:09 PM Kshitij, wrote:
>
>> Is th
How costly is it for you, to move files after generating them with Spark?
File systems tend to just update some links under the hood.
*Yohann Jardin*
Le 2/22/2020 à 11:47 AM, Kshitij a écrit :
That's the alternative ofcourse. But that is costly when we are
dealing with bunch of files.
Thanks.