Re: Caching and Actions

2015-04-09 Thread Sameer Farooqui
e, toDebugString helps a lot. > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Caching-and-Actions-tp22418p22444.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > ---

Re: Caching and Actions

2015-04-09 Thread spark_user_2015
d2 = d1.map((x,y) => (y,x)) - This avoids pipelining the "d1" mapper and "d2" mapper when computing d2 This is important to write efficient code, toDebugString helps a lot. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Caching

Re: Caching and Actions

2015-04-09 Thread Sameer Farooqui
RDD breaks down into different stages of execution. On Thu, Apr 9, 2015 at 1:58 AM, Bojan Kostic wrote: > You can use toDebugString to see all the steps in job. > > Best > Bojan > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.

Re: Caching and Actions

2015-04-09 Thread Bojan Kostic
You can use toDebugString to see all the steps in job. Best Bojan -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Caching-and-Actions-tp22418p22433.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Caching and Actions

2015-04-07 Thread spark_user_2015
ike this: data.cache() val d1 = data.map((x,y,z) => (x,y)) val d2 = data.map((x,y,z) => (y,x)) Furthermore, consider: val d3 = d2.map((x,y) => (y,x)) d2 and d3 are equivalent. What implementation should be preferred? Thx. -- View this message in context: http://apache-spark-user-lis