Re: Running spark function on parquet without sql

Michael Armbrust Fri, 27 Feb 2015 14:17:31 -0800

>
> From Zhan Zhang's reply, yes I still get the parquet's advantage.
>


You will need to at least use SQL or the DataFrame API (coming in Spark
1.3) to specify the columns that you want in order to get the parquet
benefits.   The rest of your operations can be standard Spark.

My next question is, if I operate on SchemaRdd will I get the advantage of
> Spark SQL's in memory columnar store when cached the table using
> cacheTable()?
>

Yes, SchemaRDDs always use the in-memory columnar cache for cacheTable and
.cache() since Spark 1.2+

Re: Running spark function on parquet without sql

Reply via email to