Hi Jeff, I think I see what you're saying. I was thinking more of a whole Spark job, where `spark-submit` is run once to completion and then started up again, rather than a "job" as seen in the Spark UI. I take it there is no implicit caching of results between `spark-submit` runs.
(In the case I was writing about, I think I read too much into the Ganglia network traffic view. During the runs which I believed to be IO-bound, I was carrying out a long-running database transfer on the same network. After it completed I saw a speedup, not realizing where it came from, and wondered whether there had been some kind of shifting in the data.) Eric On Tue, Sep 1, 2015 at 9:54 PM, Jeff Zhang <[email protected]> wrote: > Hi Eric, > > If the 2 jobs share the same parent stages. these stages can be skipped > for the second job. > > Here's one simple example: > > val rdd1 = sc.parallelize(1 to 10).map(e=>(e,e)) > val rdd2 = rdd1.groupByKey() > rdd2.map(e=>e._1).collect() foreach println > rdd2.map(e=> (e._1, e._2.size)).collect foreach println > > Obviously, there are 2 jobs and both of them have 2 stages. Luckily here > these 2 jobs share the same stage (the first stage of each job), although > you doesn't cache these data explicitly, once one stage is completed, it is > marked as available and can used for other jobs. so for the second job, it > only needs to run one stage. > You should be able to see the skipped stage in the spark job ui. > > > > [image: Inline image 1] > > On Wed, Sep 2, 2015 at 12:53 AM, Eric Walker <[email protected]> > wrote: > >> Hi, >> >> I'm noticing that a 30 minute job that was initially IO-bound may not be >> during subsequent runs. Is there some kind of between-job caching that >> happens in Spark or in Linux that outlives jobs and that might be making >> subsequent runs faster? If so, is there a way to avoid the caching in >> order to get a better sense of the worst-case scenario? >> >> (It's also possible that I've simply changed something that made things >> faster.) >> >> Eric >> >> > > > -- > Best Regards > > Jeff Zhang >
