Hi, sorry indeed you have to cache the dataset, before the groupby (otherwise it will be loaded at each time from disk).
For the case class you can have a look at the accepted answer here: https://stackoverflow.com/questions/45017556/how-to-convert-a-simple-dataframe-to-a-dataset-spark-scala-with-case-class Best regards, Alessandro On Fri, 28 Sep 2018 at 09:29, rishmanisation <rish.anant...@gmail.com> wrote: > Thanks for the response! I'm not sure caching 'freq' would make sense, > since > there are multiple columns in the file and so it will need to be different > for different columns. > > Original data format is .gz (gzip). > > I am a newbie to Spark, so could you please give a little more details on > the appropriate case class? > > Thanks! > > > > -- > Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ > > --------------------------------------------------------------------- > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > >