Spark keeps job in memory by default for kind of performance gains you are
seeing. Additionally depending on your query spark runs stages and any
point of time spark's code behind the scene may issue explicit cache. If
you hit any such scenario you will find those cached objects in UI under
storage
Hi,
I am trying to answer a simple query with SparkSQL over the Parquet file.
When execute the query several times, the first run will take about 2s
while the later run will take <0.1s.
By looking at the log file it seems the later runs doesn't load the data
from disk. However, I didn't enable an