Shuffle spill (memory) is the size of the deserialized form of the data in
memory at the time when we spill it, whereas shuffle spill (disk) is the
size of the serialized form of the data on disk after we spill it. This is
why the latter tends to be much smaller than the former. Note that both
metrics are aggregated over the entire duration of the task (i.e. within
each task you can spill multiple times).

Andrew


2014-07-18 4:09 GMT-07:00 Sébastien Rainville <[email protected]>
:

> Hi,
>
> in the Spark UI, one of the metrics is "shuffle spill (memory)". What is
> it exactly? Spilling to disk when the shuffle data doesn't fit in memory I
> get it, but what does it mean to spill to memory?
>
> Thanks,
>
> - Sebastien
>
>

Reply via email to