Thank you Vino for the information. Best, Ethan
> On Dec 17, 2019, at 8:29 PM, vino yang <yanghua1...@gmail.com> wrote: > > Hi Ethan, > > Share two things: > > I have found "taskmanager.memory.preallocate" config option has been removed > in the master codebase. > After researching git history, I found the description of > "taskmanager.memory.preallocate" was written by @Chesnay Schepler > <mailto:ches...@apache.org> (from 1.8 branch). So maybe he can give more > context or information. Correct me, if I am wrong. > Best, > Vino. > > Ethan Li <ethanopensou...@gmail.com <mailto:ethanopensou...@gmail.com>> > 于2019年12月18日周三 上午10:07写道: > I didn’t realize we was not chatting in the mailing list :) > > I think it’s wrong because it kind of says full GC is triggered by reaching > MaxDirecMemorySize. > > >> On Dec 16, 2019, at 11:03 PM, Xintong Song <tonysong...@gmail.com >> <mailto:tonysong...@gmail.com>> wrote: >> >> Glad that helped. I'm also posting this conversation to the public mailing >> list, in case other people have similar questions. >> >> And regarding the GC statement, I think the document is correct. >> - Flink Memory Manager guarantees that the amount of allocated managed >> memory never exceed the configured capacity, thus managed memory allocation >> should not trigger OOM. >> - When preallocation is enabled, managed memory segments are allocated and >> pooled by Flink Memory Manager, no matter there are tasks requesting them or >> not. The segments will not be deallocated until the cluster is shutdown. >> - When preallocation is disabled, managed memory segments are allocated only >> when tasks requesting them, and destroyed immediately when tasks return them >> to the Memory Manager. However, what this statement trying to say is that, >> the memory is not deallocated directly when the memory segment is destroyed, >> but will have to wait until the GC to be truly released. >> >> Thank you~ >> Xintong Song >> >> >> On Tue, Dec 17, 2019 at 12:30 PM Ethan Li <ethanopensou...@gmail.com >> <mailto:ethanopensou...@gmail.com>> wrote: >> Thank you very much Xintong! It’s much clear to me now. >> >> I am still on standalone cluster setup. Before I was using 350GB on-heap >> memory on a 378GB box. I saw a lot of swap activities. Now I understand that >> it’s because RocksDB didn’t have enough memory to use, so OS forces JVM to >> swap. It can explain why the cluster was not stable and kept crashing. >> >> Now that I put 150GB off-heap and 150GB on-heap, the cluster is more stable >> than before. I thought it was because GC was reduced because now we have >> less heap memory. Now I understand that it’s because I have 78GB memory >> available for rocksDB to use, 50GB more than before. And it explains why I >> don’t see swaps anymore. >> >> This makes sense to me now. I just have to set preallocation to false to use >> the other 150 GB off-heap memory for rocksDB and do some tuning on these >> memory configs. >> >> >> One thing I noticed is that in >> https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/config.html#taskmanager-memory-preallocate >> >> <https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/config.html#taskmanager-memory-preallocate> >> >> If this configuration is set to false cleaning up of the allocated off-heap >> memory happens only when the configured JVM parameter MaxDirectMemorySize is >> reached by triggering a full GC >> >> I think this statement is not correct. GC is not trigged by reaching >> MaxDirectMemorySize. It will throw "java.lang.OutOfMemoryError: Direct >> buffer memory” if MaxDirectMemorySize is reached. >> >> Thank you again for your help! >> >> Best, >> Ethan >> >> >>> On Dec 16, 2019, at 9:44 PM, Xintong Song <tonysong...@gmail.com >>> <mailto:tonysong...@gmail.com>> wrote: >>> >>> Hi Ethan, >>> >>> When you say "it's doing better than before", what is your setups before? >>> Is it on-heap managed memory? With preallocation enabled or disabled? Also, >>> what deployment (standalone, yarn, or local executor) do you run Flink on? >>> It's hard to tell why the performance becomes better without knowing the >>> information above. >>> >>> Since you are using RocksDB, and configure managed memory to off-heap, you >>> should set pre-allocation to false. Steaming job with RocksDB state backend >>> does not use managed memory at all. Setting managed memory to off-heap only >>> makes Flink to launch JVM with smaller heap space, leaving more space >>> outside JVM. Setting pre-allocation to false makes Flink allocate those >>> managed memory on-demand, and since there's no demand the managed memory >>> will not be allocated. Therefore, the memory space left outside JVM can be >>> fully leveraged by RocksDB. >>> >>> Regarding related source codes, I would recommend the following: >>> - MemoryManager - For how managed memory is allocated / used. Related to >>> pre-allocation. >>> - ContaineredTaskManagerParameters - For how the JVM memory parameters are >>> decided. Related to on-heap / off-heap managed memory. >>> - TaskManagerServices#fromConfiguration - For how different components are >>> created, as well as how their memory sizes are decided. Also related to >>> on-heap / off-heap managed memory. >>> >>> Thank you~ >>> Xintong Song >>> >>> >>> On Tue, Dec 17, 2019 at 11:00 AM Ethan Li <ethanopensou...@gmail.com >>> <mailto:ethanopensou...@gmail.com>> wrote: >>> Thank you Xintong, Vino for taking your time answering my question. I >>> didn’t know managed memory is only for batch jobs. >>> >>> >>> >>> I tried to set to use off-heap Flink managed memory (with preallocation to >>> true) and it’s doing better than before. It would not make sense if managed >>> memory is not used. I was confused. Then I found this doc >>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-49%3A+Unified+Memory+Configuration+for+TaskExecutors >>> >>> <https://cwiki.apache.org/confluence/display/FLINK/FLIP-49:+Unified+Memory+Configuration+for+TaskExecutors> >>> >>> Configuring an off-heap state backend like RocksDB means either also >>> setting managed memory to off-heap or adjusting the cutoff ratio, to >>> dedicate less memory to the JVM heap. >>> >>> >>> We use RocksDB too so I guess I was doing that correctly by accident. So >>> the question here is, in this case, should we set preallocate to true or >>> false? >>> >>> If set to true, TM will allocate memory off-heap during start up. Will this >>> part of memory being used by RocksDB? >>> If set to false, how is this off-memory memory being managed? Will the >>> allocated memory ever being cleaned up and reused? >>> >>> I’d really appreciate if you or anyone from the community can share some >>> ideas or point me to the code. I am reading the source code but haven’t got >>> there. >>> >>> Thank you very much! >>> >>> Best, >>> Ethan >>> >>> >>> >>>> On Dec 16, 2019, at 1:27 AM, Xintong Song <tonysong...@gmail.com >>>> <mailto:tonysong...@gmail.com>> wrote: >>>> >>>> Hi Ethan, >>>> >>>> Currently, managed memory is only used for batch jobs (DataSet / Blink >>>> SQL). Setting it to off-heap and enable pre-allocation can improve the >>>> performance on using managed memory. However, since you are running >>>> streaming jobs which "currently do not use the managed memory", I would >>>> suggest you to set managed memory to on-heap and disable pre-allocation. >>>> In this way, Flink will not allocate any managed memory segments which are >>>> actually not used, and the corresponding memory can still be used for >>>> other JVM heap usages. >>>> >>>> The above is for Flink 1.9 and earlier. In the upcoming Flink 1.10, we are >>>> removing the pre-allocation of managed memory, making managed memory >>>> always off-heap, and making rocksdb state backend to use managed memory. >>>> Which means the two config options you mentioned will no longer exist in >>>> the future releases. In case you're planing to migrate to the upcoming >>>> Flink 1.10, if your streaming jobs are using rocksdb state backend, then >>>> hopefully it's not necessary for you to change any configuration, but if >>>> your jobs are using heap state backend, it would be better to config the >>>> managed memory size / fraction to 0 because otherwise the corresponding >>>> memory cannot be used by any component. >>>> >>>> Thank you~ >>>> Xintong Song >>>> >>>> >>>> On Sat, Dec 14, 2019 at 5:20 AM Ethan Li <ethanopensou...@gmail.com >>>> <mailto:ethanopensou...@gmail.com>> wrote: >>>> Hi Community, >>>> >>>> I have a question about the taskmanager.memory.preallocate config in the >>>> doc >>>> https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/config.html#taskmanager-memory-preallocate >>>> >>>> <https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/config.html#taskmanager-memory-preallocate> >>>> >>>> We have large memory box so as it suggested we should use off heap memory >>>> for flink managed memory. And the doc then suggests to set >>>> taskmanager.memory.preallocate to true. However, >>>> >>>> "For streaming setups is is highly recommended to set this value to false >>>> as the core state backends currently do not use the managed memory." >>>> >>>> >>>> Our flink set up is mainly for streaming jobs so I think the above applies >>>> to our case. So should I use off-heap with “preallocate" setting to false? >>>> What would be the impact with these configs? >>>> >>>> >>>> Thank you very much! >>>> >>>> >>>> Best, >>>> Ethan >>> >> >