Thanks for the response! I'm not sure caching 'freq' would make sense, since there are multiple columns in the file and so it will need to be different for different columns.
Original data format is .gz (gzip). I am a newbie to Spark, so could you please give a little more details on the appropriate case class? Thanks! -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org