Thanks for the response! I'm not sure caching 'freq' would make sense, since
there are multiple columns in the file and so it will need to be different
for different columns.

Original data format is .gz (gzip).

I am a newbie to Spark, so could you please give a little more details on
the appropriate case class?

Thanks!



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to