Hi reynold, It took me some time, but I've finally found that there is a difference between spilling on the map-side and spilling on the reduce-side for a shuffle. Spilling to disk on the map-side happens by default (with the spillToPartitionFiles call from insertAll in ExternalSorter; don't know yet why there is a difference in number of calls though), spilling on the reduce side (with the maybeSpillCollection call from insertAll in ExternalSorter) is optional and based on the available memory set by spark.shuffle.memoryFraction and the total memory available. In my case, I was just seeing the spilling on the map-side, but did not realize that this is supposed to happen, regardless of the memory settings.
Thanks for your help, Tom -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Spilling-when-not-expected-tp11017p11884.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org