Hi,
Beside caching, is it possible if an RDD has multiple child RDDs? So I can
read the input one and produce multiple outputs for multiple jobs which
share the input.
On May 5, 2015 6:24 PM, "Evan R. Sparks" wrote:
> Scan sharing can indeed be a useful optimization in spark, because you
> amort
Scan sharing can indeed be a useful optimization in spark, because you
amortize not only the time spent scanning over the data, but also time
spent in task launch and scheduling overheads.
Here's a trivial example in scala. I'm not aware of a place in SparkSQL
where this is used - I'd imagine that
Hi everyone,
I have two Spark jobs inside a Spark Application, which read from the same
input file.
They are executed in 2 threads.
Right now, I cache the input file into memory before executing these two
jobs.
Are there another ways to share their same input with just only one read?
I know ther