I face the similar issue in Spark 1.2. Cache the schema RDD takes about 50s
for 400MB data. The schema is similar to the TPC-H LineItem.

Here is the code I tried the cache. I am wondering if there is any setting
missing?

Thank you so much!

lineitemSchemaRDD.registerTempTable("lineitem");
sqlContext.sqlContext().cacheTable("lineitem");
System.out.println(lineitemSchemaRDD.count());


On Mon, Apr 6, 2015 at 8:00 PM, Christian Perez <[email protected]> wrote:

> Hi all,
>
> Has anyone else noticed very slow time to cache a Parquet file? It
> takes 14 s per 235 MB (1 block) uncompressed node local Parquet file
> on M2 EC2 instances. Or are my expectations way off...
>
> Cheers,
>
> Christian
>
> --
> Christian Perez
> Silicon Valley Data Science
> Data Analyst
> [email protected]
> @cp_phd
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>


-- 
Wenlei Xie (谢文磊)

Ph.D. Candidate
Department of Computer Science
456 Gates Hall, Cornell University
Ithaca, NY 14853, USA
Email: [email protected]

Reply via email to