Re: Questions about taskmanager.memory.off-heap and taskmanager.memory.preallocate

vino yang Tue, 17 Dec 2019 18:30:51 -0800

Hi Ethan,

Share two things:



   - I have found "taskmanager.memory.preallocate" config option has been
   removed in the master codebase.
   - After researching git history, I found the description of "
   taskmanager.memory.preallocate" was written by @Chesnay Schepler
   <ches...@apache.org>  (from 1.8 branch). So maybe he can give more
   context or information. Correct me, if I am wrong.

Best,
Vino.

Ethan Li <ethanopensou...@gmail.com> 于2019年12月18日周三 上午10:07写道：

> I didn’t realize we was not chatting in the mailing list :)
>
> I think it’s wrong because it kind of says full GC is triggered by
> reaching MaxDirecMemorySize.
>
>
> On Dec 16, 2019, at 11:03 PM, Xintong Song <tonysong...@gmail.com> wrote:
>
> Glad that helped. I'm also posting this conversation to the public mailing
> list, in case other people have similar questions.
>
> And regarding the GC statement, I think the document is correct.
> - Flink Memory Manager guarantees that the amount of allocated managed
> memory never exceed the configured capacity, thus managed memory allocation
> should not trigger OOM.
> - When preallocation is enabled, managed memory segments are allocated and
> pooled by Flink Memory Manager, no matter there are tasks requesting them
> or not. The segments will not be deallocated until the cluster is shutdown.
> - When preallocation is disabled, managed memory segments are allocated
> only when tasks requesting them, and destroyed immediately when tasks
> return them to the Memory Manager. However, what this statement trying to
> say is that, the memory is not deallocated directly when the memory segment
> is destroyed, but will have to wait until the GC to be truly released.
>
> Thank you~
> Xintong Song
>
>
>
> On Tue, Dec 17, 2019 at 12:30 PM Ethan Li <ethanopensou...@gmail.com>
> wrote:
>
>> Thank you very much Xintong! It’s much clear to me now.
>>
>> I am still on standalone cluster setup.  Before I was using 350GB on-heap
>> memory on a 378GB box. I saw a lot of swap activities. Now I understand
>> that it’s because RocksDB didn’t have enough memory to use, so OS forces
>> JVM to swap. It can explain why the cluster was not stable and kept
>> crashing.
>>
>> Now that I put 150GB off-heap and 150GB on-heap, the cluster is more
>> stable than before. I thought it was because GC was reduced because now we
>> have less heap memory. Now I understand that it’s because I have 78GB
>> memory available for rocksDB to use, 50GB more than before. And it explains
>> why I don’t see swaps anymore.
>>
>> This makes sense to me now. I just have to set preallocation to false to
>> use the other 150 GB off-heap memory for rocksDB and do some tuning on
>> these memory configs.
>>
>>
>> One thing I noticed is that in
>> https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/config.html#taskmanager-memory-preallocate
>>
>>  If this configuration is set to false cleaning up of the allocated
>> off-heap memory happens only when the configured JVM parameter
>> MaxDirectMemorySize is reached by triggering a full GC
>>
>> I think this statement is not correct. GC is not trigged by reaching
>> MaxDirectMemorySize. It will throw "java.lang.OutOfMemoryError: Direct
>> buffer memory” if MaxDirectMemorySize is reached.
>>
>> Thank you again for your help!
>>
>> Best,
>> Ethan
>>
>>
>> On Dec 16, 2019, at 9:44 PM, Xintong Song <tonysong...@gmail.com> wrote:
>>
>> Hi Ethan,
>>
>> When you say "it's doing better than before", what is your setups before?
>> Is it on-heap managed memory? With preallocation enabled or disabled? Also,
>> what deployment (standalone, yarn, or local executor) do you run Flink on?
>> It's hard to tell why the performance becomes better without knowing the
>> information above.
>>
>> Since you are using RocksDB, and configure managed memory to off-heap,
>> you should set pre-allocation to false. Steaming job with RocksDB state
>> backend does not use managed memory at all. Setting managed memory to
>> off-heap only makes Flink to launch JVM with smaller heap space, leaving
>> more space outside JVM. Setting pre-allocation to false makes Flink
>> allocate those managed memory on-demand, and since there's no demand the
>> managed memory will not be allocated. Therefore, the memory space left
>> outside JVM can be fully leveraged by RocksDB.
>>
>> Regarding related source codes, I would recommend the following:
>> - MemoryManager - For how managed memory is allocated / used. Related to
>> pre-allocation.
>> - ContaineredTaskManagerParameters - For how the JVM memory parameters
>> are decided. Related to on-heap / off-heap managed memory.
>> - TaskManagerServices#fromConfiguration - For how different components
>> are created, as well as how their memory sizes are decided. Also related to
>> on-heap / off-heap managed memory.
>>
>> Thank you~
>> Xintong Song
>>
>>
>>
>> On Tue, Dec 17, 2019 at 11:00 AM Ethan Li <ethanopensou...@gmail.com>
>> wrote:
>>
>>> Thank you Xintong, Vino for taking your time answering my question. I
>>> didn’t know managed memory is only for batch jobs.
>>>
>>>
>>>
>>> I tried to set to use off-heap Flink managed memory (with preallocation
>>> to true) and it’s doing better than before. It would not make sense if
>>> managed memory is not used. I was confused. Then I found this doc
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-49%3A+Unified+Memory+Configuration+for+TaskExecutors
>>> <https://cwiki.apache.org/confluence/display/FLINK/FLIP-49:+Unified+Memory+Configuration+for+TaskExecutors>
>>>
>>> Configuring an off-heap state backend like RocksDB means either also
>>> setting managed memory to off-heap or adjusting the cutoff ratio, to
>>> dedicate less memory to the JVM heap.
>>>
>>>
>>> We use RocksDB too so I guess I was doing that correctly by accident. So
>>> the question here is, in this case, should we set preallocate to true or
>>> false?
>>>
>>> If set to true, TM will allocate memory off-heap during start up. Will
>>> this part of memory being used by RocksDB?
>>> If set to false, how is this off-memory memory being managed? Will the
>>> allocated memory ever being cleaned up and reused?
>>>
>>> I’d really appreciate if you or anyone from the community can share some
>>> ideas or point me to the code. I am reading the source code but haven’t got
>>> there.
>>>
>>> Thank you very much!
>>>
>>> Best,
>>> Ethan
>>>
>>>
>>>
>>> On Dec 16, 2019, at 1:27 AM, Xintong Song <tonysong...@gmail.com> wrote:
>>>
>>> Hi Ethan,
>>>
>>> Currently, managed memory is only used for batch jobs (DataSet / Blink
>>> SQL). Setting it to off-heap and enable pre-allocation can improve the
>>> performance on using managed memory. However, since you are running
>>> streaming jobs which "currently do not use the managed memory", I would
>>> suggest you to set managed memory to on-heap and disable pre-allocation. In
>>> this way, Flink will not allocate any managed memory segments which are
>>> actually not used, and the corresponding memory can still be used for other
>>> JVM heap usages.
>>>
>>> The above is for Flink 1.9 and earlier. In the upcoming Flink 1.10, we
>>> are removing the pre-allocation of managed memory, making managed memory
>>> always off-heap, and making rocksdb state backend to use managed memory.
>>> Which means the two config options you mentioned will no longer exist in
>>> the future releases. In case you're planing to migrate to the upcoming
>>> Flink 1.10, if your streaming jobs are using rocksdb state backend, then
>>> hopefully it's not necessary for you to change any configuration, but if
>>> your jobs are using heap state backend, it would be better to config the
>>> managed memory size / fraction to 0 because otherwise the corresponding
>>> memory cannot be used by any component.
>>>
>>> Thank you~
>>> Xintong Song
>>>
>>>
>>>
>>> On Sat, Dec 14, 2019 at 5:20 AM Ethan Li <ethanopensou...@gmail.com>
>>> wrote:
>>>
>>>> Hi Community,
>>>>
>>>> I have a question about the taskmanager.memory.preallocate config in
>>>> the doc
>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/config.html#taskmanager-memory-preallocate
>>>>
>>>> We have large memory box so as it suggested we should use off heap
>>>> memory for flink managed memory. And the doc then suggests to
>>>> set taskmanager.memory.preallocate to true. However,
>>>>
>>>>  "For streaming setups is is highly recommended to set this value to
>>>> false as the core state backends currently do not use the managed memory."
>>>>
>>>>
>>>> Our flink set up is mainly for streaming jobs so I think the above
>>>> applies to our case. So should I use off-heap with “preallocate" setting to
>>>> false? What would be the impact with these configs?
>>>>
>>>>
>>>> Thank you very much!
>>>>
>>>>
>>>> Best,
>>>> Ethan
>>>>
>>>
>>>
>>
>

Re: Questions about taskmanager.memory.off-heap and taskmanager.memory.preallocate

Reply via email to