Hi, It depends on your case but if you do shuffle it’s expensive operation unless you want to reduce number of files and it's not parallel so it might have cost you a lot of time to write data.
Regards, Chanh > On Oct 7, 2016, at 1:25 AM, Anubhav Agarwal <anubha...@gmail.com> wrote: > > Hi, > I already had the following set:- > sc.hadoopConfiguration.set("parquet.enable.summary-metadata", "false") > > Will add the other setting too. > > But my question is I am correct in assuming Append mode shuffles all data to > one node before writing? > And do other modes do the same or all executors write to the folder in > parallel . > > Thank You, > Anu > > On Thu, Oct 6, 2016 at 11:36 AM, Chanh Le <giaosu...@gmail.com > <mailto:giaosu...@gmail.com>> wrote: > Hi Abnubhav, > The best way to store parquet is partition it by time or specific field that > you are going to mark for delete after the time. > in my case I partition my data by time so I can easy to delete the data after > 30 days. > Use with mode Append and disable the summary information > > sc.hadoopConfiguration.set("parquet.enable.summary-metadata", "false") > sc.hadoopConfiguration.set("mapreduce.fileoutputcommitter.marksuccessfuljobs", > "false") > > Regards, > Chanh > > >> On Oct 6, 2016, at 10:32 PM, Anubhav Agarwal <anubha...@gmail.com >> <mailto:anubha...@gmail.com>> wrote: >> >> Hi all, >> I have searched a bit before posting this query. >> >> Using Spark 1.6.1 >> Dataframe.write().format("parquet").mode(SaveMode.Append).save("location) >> >> Note:- The data in that folder can be deleted and most of the times that >> folder doesn't even exist. >> >> Which Savemode is the best, if necessary at all? >> >> I am using Savemode.Append which seems to cause huge amounts of shuffle as >> only executioner is doing the actual write. (May be wrong) >> >> Would using Overwrite cause all the executors write to that folder at once >> or would this also send data to one single executor before writing? >> >> Or should I not use any of the modes at all and just do a write? >> >> >> Thank You, >> Anu > >