Re: Shuffle Spill Memory and Shuffle Spill Disk

2015-03-23 Thread Sandy Ryza
Hi Bijay, The Shuffle Spill (Disk) is the total number of bytes written to disk by records spilled during the shuffle. The Shuffle Spill (Memory) is the amount of space the spilled records occupied in memory before they were spilled. These differ because the serialized format is more compact, an

Shuffle Spill Memory and Shuffle Spill Disk

2015-03-23 Thread Bijay Pathak
Hello, I am running TeraSort on 100GB of data. The final metrics I am getting on Shuffle Spill are: Shuffle Spill(Memory): 122.5 GB Shuffle Spill(Disk): 3.4 GB What's the difference and relation between these two metrics? Does these mean 122.5 GB was s