Re: Cached RDD

2014-12-30 Thread Rishi Yadav
Without caching, each action is recomputed. So assuming rdd2 and rdd3 result in separate actions answer is yes. On Mon, Dec 29, 2014 at 7:53 PM, Corey Nolet wrote: > If I have 2 RDDs which depend on the same RDD like the following: > > val rdd1 = ... > > val rdd2 = rdd1.groupBy()... > > val rdd3

Re: Cached RDD Block Size - Uneven Distribution

2014-08-13 Thread anthonyjschu...@gmail.com
I am having a similar problem: I have a large dataset in HDFS and (for a few possible reason including a filter operation, and some of my computation nodes simply not being hdfs datanodes) have a large skew on my RDD blocks: the master node always has the most, while the worker nodes have few... (

Re: Cached RDD Block Size - Uneven Distribution

2014-08-04 Thread Patrick Wendell
Are you directly caching files from Hadoop or are you doing some transformation on them first? If you are doing a groupBy or some type of transformation, then you could be causing data skew that way. On Sun, Aug 3, 2014 at 1:19 PM, iramaraju wrote: > I am running spark 1.0.0, Tachyon 0.5 and Ha