I am not pretty sure, but:
- if RDD persisted in memory then on task fail executor JVM process fails
too, so the memory is released
- if RDD persisted on disk then on task fail Spark shutdown hook just
wipes temp files
On Thu, Mar 23, 2017 at 10:55 AM, Jörn Franke wrote:
> What do you mean by
Hi!
I use Spark heavily for various workloads and always fall in the situation
when there is some skewed dataset (without any partitioner assigned) and I
just want to "redistribute" its data more evenly.
For example, say there is RDD of X partitions with Y rows on each except
one large partition