Hi,
It depends on your case but if you do shuffle it’s expensive operation unless 
you want to reduce number of files and it's not parallel so it might have cost 
you a lot of time to write data.

Regards,
Chanh



> On Oct 7, 2016, at 1:25 AM, Anubhav Agarwal <anubha...@gmail.com> wrote:
> 
> Hi,
> I already had the following set:-
> sc.hadoopConfiguration.set("parquet.enable.summary-metadata", "false")
> 
> Will add the other setting too.
> 
> But my question is I am correct in assuming Append mode shuffles all data to 
> one node before writing?
> And do other modes do the same or all executors write to the folder in 
> parallel .
> 
> Thank You,
> Anu
> 
> On Thu, Oct 6, 2016 at 11:36 AM, Chanh Le <giaosu...@gmail.com 
> <mailto:giaosu...@gmail.com>> wrote:
> Hi Abnubhav,
> The best way to store parquet is partition it by time or specific field that 
> you are going to mark for delete after the time.
> in my case I partition my data by time so I can easy to delete the data after 
> 30 days.
> Use with mode Append and disable the summary information 
> 
> sc.hadoopConfiguration.set("parquet.enable.summary-metadata", "false")
> sc.hadoopConfiguration.set("mapreduce.fileoutputcommitter.marksuccessfuljobs",
>  "false")
> 
> Regards,
> Chanh
> 
> 
>> On Oct 6, 2016, at 10:32 PM, Anubhav Agarwal <anubha...@gmail.com 
>> <mailto:anubha...@gmail.com>> wrote:
>> 
>> Hi all,
>> I have searched a bit before posting this query.
>> 
>> Using Spark 1.6.1
>> Dataframe.write().format("parquet").mode(SaveMode.Append).save("location)
>> 
>> Note:- The data in that folder can be deleted and most of the times that 
>> folder doesn't even exist.
>> 
>> Which Savemode is the best, if necessary at all?
>> 
>> I am using Savemode.Append which seems to cause huge amounts of shuffle as 
>> only executioner is doing the actual write. (May be wrong)
>> 
>> Would using Overwrite cause all the executors write to that folder at once 
>> or would this also send data to one single executor before writing?
>> 
>> Or should I not use any of the modes at all and just do a write?
>> 
>> 
>> Thank You,
>> Anu
> 
> 

Reply via email to