Hi Jeff,
I think I see what you're saying. I was thinking more of a whole Spark
job, where `spark-submit` is run once to completion and then started up
again, rather than a "job" as seen in the Spark UI. I take it there is no
implicit caching of results between `spark-submit` runs.
(In the case
Hi Eric,
If the 2 jobs share the same parent stages. these stages can be skipped for
the second job.
Here's one simple example:
val rdd1 = sc.parallelize(1 to 10).map(e=>(e,e))
val rdd2 = rdd1.groupByKey()
rdd2.map(e=>e._1).collect() foreach println
rdd2.map(e=> (e._1, e._2.size)).collect foreac