Re: Automatic Cache in SparkSQL

2015-04-27 Thread ayan guha
Spark keeps job in memory by default for kind of performance gains you are seeing. Additionally depending on your query spark runs stages and any point of time spark's code behind the scene may issue explicit cache. If you hit any such scenario you will find those cached objects in UI under storage

Automatic Cache in SparkSQL

2015-04-27 Thread Wenlei Xie
Hi, I am trying to answer a simple query with SparkSQL over the Parquet file. When execute the query several times, the first run will take about 2s while the later run will take <0.1s. By looking at the log file it seems the later runs doesn't load the data from disk. However, I didn't enable an