hi, I'm learning spark, and wonder when to delete shuffle data, I find the
ContextCleaner class which clean the shuffle data when shuffle dependency
is GC-ed. Based on source code, the shuffle dependency is gc-ed only when
active job finish, but i'm not sure, Could you explain the life cycle of a
e you looking for SparkContext.union() [1] ?
>
> This is not performing well with spark cassandra connector. I am not
> sure whether this will help you.
>
> Thanks and Regards
> Noorul
>
> [1]
> http://spark.apache.org/docs/1.3.0/api/scala/index.html#org.apache.spark.SparkContext
>
--
Yang Chen
Dept. of CISE, University of Florida
Mail: y...@yang-cs.com
Web: www.cise.ufl.edu/~yang
Hi Mark,
That's true, but in neither way can I combine the RDDs, so I have to avoid
unions.
Thanks,
Yang
On Thu, Mar 26, 2015 at 5:31 PM, Mark Hamstra
wrote:
> RDD#union is not the same thing as SparkContext#union
>
> On Thu, Mar 26, 2015 at 2:27 PM, Yang Chen wrote:
ection instead of RDD.
>
> val result = sc.parallelize(data)// Create and partition the
> 0.5M items in a single RDD.
> .flatMap(compute(_)) // You still have only one RDD with each item
> joined with external data already
>
> Hope this help.
>
> Kelvin
&