Re: Spark SQL takes unexpected time

2014-11-04 Thread Michael Armbrust
People also store data off-heap by putting parquet data into Tachyon. The optimization in 1.2 is to use the in-memory columnar cached format instead of keeping row objects (and their boxed contents) around when you call .cache(). This significantly reduces the number of live objects. (since you h

Re: Spark SQL takes unexpected time

2014-11-04 Thread Corey Nolet
Michael, I should probably look closer myself @ the design of 1.2 vs 1.1 but I've been curious why Spark's in-memory data uses the heap instead of putting it off heap? Was this the optimization that was done in 1.2 to alleviate GC? On Mon, Nov 3, 2014 at 8:52 PM, Shailesh Birari wrote: > Yes, I

Re: Spark SQL takes unexpected time

2014-11-03 Thread Shailesh Birari
Yes, I am using Spark1.1.0 and have used rdd.registerTempTable(). I tried by adding sqlContext.cacheTable(), but it took 59 seconds (more than earlier). I also tried by changing schema to use Long data type in some fields but seems conversion takes more time. Is there any way to specify index ?

Re: Spark SQL takes unexpected time

2014-11-03 Thread Michael Armbrust
If you are running on Spark 1.1 or earlier you'll want to use rdd.registerTempTable() followed by sqlContext.cacheTable() and then query that table. rdd.cache() is not using the optimized in-memory format and thus puts a lot of pressure on the GC. This is fixed in Spark 1.2 and .cache() should do