e, toDebugString helps a lot.
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Caching-and-Actions-tp22418p22444.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---
d2 = d1.map((x,y) => (y,x))
- This avoids pipelining the "d1" mapper and "d2" mapper when
computing d2
This is important to write efficient code, toDebugString helps a lot.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Caching
RDD breaks down into
different stages of execution.
On Thu, Apr 9, 2015 at 1:58 AM, Bojan Kostic wrote:
> You can use toDebugString to see all the steps in job.
>
> Best
> Bojan
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.
You can use toDebugString to see all the steps in job.
Best
Bojan
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Caching-and-Actions-tp22418p22433.html
Sent from the Apache Spark User List mailing list archive at Nabble.com
ike this:
data.cache()
val d1 = data.map((x,y,z) => (x,y))
val d2 = data.map((x,y,z) => (y,x))
Furthermore, consider:
val d3 = d2.map((x,y) => (y,x))
d2 and d3 are equivalent. What implementation should be preferred?
Thx.
--
View this message in context:
http://apache-spark-user-lis