Re: Spark uses disk instead of memory to store RDD blocks

Takeshi Yamamuro Thu, 12 May 2016 19:30:06 -0700

If you invoked the shuffling that eats a large amount of execution memory,
it possibly swept away
cached RDD blocks because the memory for the shuffling run short.
Please see:
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/memory/UnifiedMemoryManager.scala#L32


// maropu

On Fri, May 13, 2016 at 9:35 AM, Alexander Pivovarov <apivova...@gmail.com>
wrote:

> Each executor on the screenshot has 25GB memory remaining . What was the
> reason to store 170-500 MB to disk if executor has 25GB memory available?
>
> On Thu, May 12, 2016 at 5:12 PM, Takeshi Yamamuro <linguin....@gmail.com>
> wrote:
>
>> Hi,
>>
>> Not sure this is a correct answer though, seems `UnifiedMemoryManager`
>> spills
>> some blocks of RDDs into disk when execution memory runs short.
>>
>> // maropu
>>
>> On Fri, May 13, 2016 at 6:16 AM, Alexander Pivovarov <
>> apivova...@gmail.com> wrote:
>>
>>> Hello Everyone
>>>
>>> I use Spark 1.6.0 on YARN  (EMR-4.3.0)
>>>
>>> I use MEMORY_AND_DISK_SER StorageLevel for my RDD. And I use Kryo
>>> Serializer
>>>
>>> I noticed that Spark uses Disk to store some RDD blocks even if
>>> Executors have lots memory available. See the screenshot
>>> http://postimg.org/image/gxpsw1fk1/
>>>
>>> Any ideas why it might happen?
>>>
>>> Thank you
>>> Alex
>>>
>>
>>
>>
>> --
>> ---
>> Takeshi Yamamuro
>>
>
>


-- 
---
Takeshi Yamamuro

Re: Spark uses disk instead of memory to store RDD blocks

Reply via email to