If you want to reuse the data, you need to call rdd2.cache


On Sun, Sep 6, 2015 at 2:33 PM, Priya Ch <[email protected]>
wrote:

> Hi All,
>
>  In Spark, each action results in launching a job. Lets say my spark app
> looks as-
>
> val baseRDD =sc.parallelize(Array(1,2,3,4,5),2)
> val rdd1 = baseRdd.map(x => x+2)
> val rdd2 = rdd1.filter(x => x%2 ==0)
> val count = rdd2.count
> val firstElement = rdd2.first
>
> println("Count is"+count)
> println("First is"+firstElement)
>
> Now, rdd2.count launches  job0 with 1 task and rdd2.first launches job1
> with 1 task. Here in job2, when calculating rdd.first, is the entire
> lineage computed again or else as job0 already computes rdd2, is it reused
> ???
>
> Thanks,
> Padma Ch
>
>



-- 
Best Regards

Jeff Zhang

Reply via email to