Hi,
Beside caching, is it possible if an RDD has multiple child RDDs? So I can
read the input one and produce multiple outputs for multiple jobs which
share the input.
On May 5, 2015 6:24 PM, "Evan R. Sparks" wrote:
> Scan sharing can indeed be a useful optimization in spark, because you
> amort
Scan sharing can indeed be a useful optimization in spark, because you
amortize not only the time spent scanning over the data, but also time
spent in task launch and scheduling overheads.
Here's a trivial example in scala. I'm not aware of a place in SparkSQL
where this is used - I'd imagine that