Hi,
sorry indeed you have to cache the dataset, before the groupby (otherwise
it will be loaded at each time from disk).

For the case class you can have a look at the accepted answer here:
https://stackoverflow.com/questions/45017556/how-to-convert-a-simple-dataframe-to-a-dataset-spark-scala-with-case-class


Best regards,
Alessandro

On Fri, 28 Sep 2018 at 09:29, rishmanisation <rish.anant...@gmail.com>
wrote:

> Thanks for the response! I'm not sure caching 'freq' would make sense,
> since
> there are multiple columns in the file and so it will need to be different
> for different columns.
>
> Original data format is .gz (gzip).
>
> I am a newbie to Spark, so could you please give a little more details on
> the appropriate case class?
>
> Thanks!
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>

Reply via email to