Eliminating shuffle write and spill disk IO reads/writes in Spark

Michael Slavitch Fri, 01 Apr 2016 11:33:07 -0700

Hello;

I’m working on spark with very large memory systems (2TB+) and notice that 
Spark spills to disk in shuffle.  Is there a way to force spark to stay in 
memory when doing shuffle operations?   The goal is to keep the shuffle data 
either in the heap or in off-heap memory (in 1.6.x) and never touch the IO 
subsystem.  I am willing to have the job fail if it runs out of RAM.


spark.shuffle.spill true  is deprecated in 1.6 and does not work in Tungsten 
sort in 1.5.x

"WARN UnsafeShuffleManager: spark.shuffle.spill was set to false, but this is 
ignored by the tungsten-sort shuffle manager; its optimized shuffles will 
continue to spill to disk when necessary.”

If this is impossible via configuration changes what code changes would be 
needed to accomplish this?





---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Eliminating shuffle write and spill disk IO reads/writes in Spark

Reply via email to