Without caching, each action is recomputed. So assuming rdd2 and rdd3
result in separate actions answer is yes.
On Mon, Dec 29, 2014 at 7:53 PM, Corey Nolet wrote:
> If I have 2 RDDs which depend on the same RDD like the following:
>
> val rdd1 = ...
>
> val rdd2 = rdd1.groupBy()...
>
> val rdd3
I am having a similar problem:
I have a large dataset in HDFS and (for a few possible reason including a
filter operation, and some of my computation nodes simply not being hdfs
datanodes) have a large skew on my RDD blocks: the master node always has
the most, while the worker nodes have few... (
Are you directly caching files from Hadoop or are you doing some
transformation on them first? If you are doing a groupBy or some type of
transformation, then you could be causing data skew that way.
On Sun, Aug 3, 2014 at 1:19 PM, iramaraju wrote:
> I am running spark 1.0.0, Tachyon 0.5 and Ha