Hello; I’m working on spark with very large memory systems (2TB+) and notice that Spark spills to disk in shuffle. Is there a way to force spark to stay in memory when doing shuffle operations? The goal is to keep the shuffle data either in the heap or in off-heap memory (in 1.6.x) and never touch the IO subsystem. I am willing to have the job fail if it runs out of RAM.
spark.shuffle.spill true is deprecated in 1.6 and does not work in Tungsten sort in 1.5.x "WARN UnsafeShuffleManager: spark.shuffle.spill was set to false, but this is ignored by the tungsten-sort shuffle manager; its optimized shuffles will continue to spill to disk when necessary.” If this is impossible via configuration changes what code changes would be needed to accomplish this? --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org