Greetings, I was wondering why Spark's Shuffler always persists the shuffle data to disk? I understand that the persisted data can be used by the scheduler to truncate the lineage of the RDD graph if an existing RDD has been materialized as a side effect of an earlier shuffle. But that does not explain why Spark is not keeping the shuffle RDD in memory until memory becomes sufficiently low to trigger victim selection and spilling. Any hints and pointers would be appreciated.
Thanks, Effi