Re: [pyspark 2.3] count followed by write on dataframe

2019-05-20 Thread Keith Chapman
Yes that is correct, that would cause computation twice. If you want the computation to happen only once you can cache the dataframe and call count and write on the cached dataframe. Regards, Keith. http://keith-chapman.com On Mon, May 20, 2019 at 6:43 PM Rishi Shah wrote: > Hi All, > > Just

[pyspark 2.3] count followed by write on dataframe

2019-05-20 Thread Rishi Shah
Hi All, Just wanted to confirm my understanding around actions on dataframe. If dataframe is not persisted at any point, & count() is called on a dataframe followed by write action --> this would trigger dataframe computation twice (which could be the performance hit for a larger dataframe).. Coul