Hi All,
In Spark, each action results in launching a job. Lets say my spark app
looks as-
val baseRDD =sc.parallelize(Array(1,2,3,4,5),2)
val rdd1 = baseRdd.map(x => x+2)
val rdd2 = rdd1.filter(x => x%2 ==0)
val count = rdd2.count
val firstElement = rdd2.first
println("Count is"+count)
println("First is"+firstElement)
Now, rdd2.count launches job0 with 1 task and rdd2.first launches job1
with 1 task. Here in job2, when calculating rdd.first, is the entire
lineage computed again or else as job0 already computes rdd2, is it reused
???
Thanks,
Padma Ch