Thanks Andrey for kicking this discussion off. Regarding "direct" vs. "off-heap", I'm personally in favor of renaming the "direct" memory in the current FLIP-116[1] to "off-heap" memory, and making it also account for user native memory usage.
On one hand, I think it would be good that JM & TM provide consistent concepts and terminologies to users. IIUC, this is exactly the purpose of this FLIP. For TMs, we already have "off-heap" memory accounting for both direct and native memory usages, and we did this so that users do not need to understand the differences between the two kinds. On the other hand, while for TMs it is hard to tell which kind of memory is needed mostly due to variety of applications, I believe for JM the major memory consumption is heap memory in most cases. That means we probably can rely on the heap activities to trigger GC in most cases, and the max direct memory limit can act as a safe net. Moreover, I think the cases should be very rare that we need native memory for user codes. Therefore, we probably should not break the JM/TM consistency for potential risks in such rare cases. WDYT? Thank you~ Xintong Song [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP+116%3A+Unified+Memory+Configuration+for+Job+Managers On Wed, Mar 11, 2020 at 8:56 PM Andrey Zagrebin <azagre...@apache.org> wrote: > Hi All, > > As you may have noticed, 1.10 release included an extensive improvements to > memory management and configuration of Task Managers, FLIP-49: [1]. The > memory configuration of Job Managers has not been touched in 1.10. > > Although, Job Manager's memory model does not look so sophisticated as > for Task Managers, It makes to align Job Manager memory model and settings > with Task Managers. Therefore, we propose to reconsider it as well in 1.11 > and I prepared a FLIP 116 [2] for that. > > Any feedback is appreciated. > > So far, there is one discussion point about how to address native > non-direct memory usage of user code. The user code can be run e.g. in > certain job submission scenarios within the JM process. For simplicity, > FLIP suggests only an option for direct memory which is translated into the > setting of the JVM direct memory limit. > Although, we documented for TM that the similar parameters can also > address native non-direct memory usage [3], this can lead to wrong > functioning of the JVM direct memory limit. The direct memory option in JM > could be also named in more general way, e.g. off-heap memory but this > naming would somewhat hide its nature of JVM direct memory limit. > On the other hand, JVM Overhead does not suffer from this problem and > affects only the container/worker memory size which is the most important > matter to address for the native non-direct memory consumption. The caveat > here is that JVM Overhead was not supposed to be used by any Flink or user > components. > > Thanks, > Andrey > > [1] > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-49%3A+Unified+Memory+Configuration+for+TaskExecutors > [2] > > https://cwiki.apache.org/confluence/display/FLINK/FLIP+116%3A+Unified+Memory+Configuration+for+Job+Managers > [3] > > https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/memory/mem_detail.html#overview >