There should be no difference assuming you don't use the intermediately
stored rdd values you are creating for anything else (rdd1, rdd2). In the
first example it still is creating these intermediate rdd objects you are
just using them implicitly and not storing the value.
It's also worth pointing
Hi All ,
What is difference between below in terms of execution to the cluster with
1 or more worker node
rdd.map(...).map(...)...map(..)
vs
val rdd1 = rdd.map(...)
val rdd2 = rdd1.map(...)
val rdd3 = rdd2.map(...)
Thanks,
Ashish