Re: [pyspark 2.3+] how to dynamically determine DataFrame partitions while writing

2019-05-22 Thread Rishi Shah
Hi All, Any idea about this? Thanks, Rishi On Tue, May 21, 2019 at 11:29 PM Rishi Shah wrote: > Hi All, > > What is the best way to determine partitions of a dataframe dynamically > before writing to disk? > > 1) statically determine based on data and use coalesce or repartition > while writin

[pyspark 2.3+] how to dynamically determine DataFrame partitions while writing

2019-05-21 Thread Rishi Shah
Hi All, What is the best way to determine partitions of a dataframe dynamically before writing to disk? 1) statically determine based on data and use coalesce or repartition while writing 2) somehow determine count of records for entire dataframe and divide that number to determine partition - ho