Re: Performance of loading parquet files into case classes in Spark

2016-08-29 Thread Julien Dumazert
gt; df.as[A].map(_.fieldToSum).agg(sum("value")).collect() > To try using Dataframes aggregation on Dataset instead of reduce. > > Regards, > Maciek > > 2016-08-28 21:27 GMT+02:00 Julien Dumazert <mailto:julien.dumaz...@gmail.com>>: > Hi Maciek, > &g

Re: Performance of loading parquet files into case classes in Spark

2016-08-28 Thread Julien Dumazert
me encoding/decoding mechanism as for any other case class. Best regards, Julien > Le 27 août 2016 à 22:32, Maciej Bryński a écrit : > > > 2016-08-27 15:27 GMT+02:00 Julien Dumazert <mailto:julien.dumaz...@gmail.com>>: > df.map(row => row.getAs[Long]("field

Performance of loading parquet files into case classes in Spark

2016-08-27 Thread Julien Dumazert
Hi all, I'm forwarding a question I recently asked on Stack Overflow about benchmarking Spark performance when working with case classes stored in Parquet files. I am assessing the pe