Re: Persist RDD doubt

2017-03-23 Thread Artur R
I am not pretty sure, but: - if RDD persisted in memory then on task fail executor JVM process fails too, so the memory is released - if RDD persisted on disk then on task fail Spark shutdown hook just wipes temp files On Thu, Mar 23, 2017 at 10:55 AM, Jörn Franke wrote: > What do you mean by

How to redistribute dataset without full shuffle

2017-03-17 Thread Artur R
Hi! I use Spark heavily for various workloads and always fall in the situation when there is some skewed dataset (without any partitioner assigned) and I just want to "redistribute" its data more evenly. For example, say there is RDD of X partitions with Y rows on each except one large partition