Hi, I am using Spark 1.3.1 on EMR with lots of memory. I have attempted to run a large pyspark job several times, specifying `spark.shuffle.spill=false` in different ways. It seems that the setting is ignored, at least partially, and some of the tasks start spilling large amounts of data to disk. The job has been fast enough in the past, but once it starts spilling to disk it lands on Miller's planet [1].
Is this expected behavior? Is it a misconfiguration on my part, e.g., could there be an incompatible setting that is overriding `spark.shuffle.spill=false`? Is it something that goes back to Spark 1.3.1? Is it something that goes back to EMR? When I've allowed the job to continue on for a while, I've started to see Kryo stack traces in the tasks that are spilling to disk. The stack traces mention there not being enough disk space, although a `df` shows plenty of space (perhaps after the fact, when temporary files have been cleaned up). Has anyone run into something like this before? I would be happy to see OOM errors, because that would be consistent with one understanding of what might be going on, but I haven't yet. Eric [1] https://www.youtube.com/watch?v=v7OVqXm7_Pk&safe=active