I agree with Xintong's proposal. If we see that many users run into this problem, then one could think about escalating the warning message into a failure.
Cheers, Till On Thu, Mar 19, 2020 at 4:23 AM Xintong Song <tonysong...@gmail.com> wrote: > I think recommend a minimum value in docs and throw a warning if the heap > size is too small should be good enough. > Not sure about failing job if the min heap is not fulfilled. As already > mentioned, it would be hard to determine the min heap size. And if we make > the min heap configurable, then in any case that users need to configure > the min heap, they can configure the heap size directly. > > Thank you~ > > Xintong Song > > > > On Wed, Mar 18, 2020 at 10:55 PM Andrey Zagrebin <azagre...@apache.org> > wrote: > > > Hi all, > > > > One thing more thing to mention, the current calculations can lead to > > arbitrary small JVM Heap, maybe even zero. > > I suggest to introduce a check where we at least recommend to set the JVM > > heap to e.g. 128Mb. > > > > Additionally, we can demand some minimum value to function and fail if it > > is not fulfilled. > > We could experiment with what is the working minimum but It is hard to > come > > up with this limit because it again can depend on the job and > environment. > > > > Best, > > Andrey > > > > On Wed, Mar 18, 2020 at 5:03 PM Andrey Zagrebin <azagre...@apache.org> > > wrote: > > > > > Hi all, > > > > > > Thanks for the feedback, Xintong and Till. > > > > > > > rename jobmanager.memory.direct.size into > > jobmanager.memory.off-heap.size > > > > > > I am ok with that to align it with TM and avoid further complications > for > > > users. > > > I will adjust the FLIP. > > > > > > > change the default value of JM Metaspace size to 256 MB > > > > > > Indeed, no reason to assume that the user code would need less > Metaspace > > > in JM. > > > I will change it unless a better argument is reported for another > value. > > > > > > I think all concerns has been resolved so I am starting the voting in a > > > separate thread. > > > > > > Best, > > > Andrey > > > > > > On Tue, Mar 17, 2020 at 6:16 PM Till Rohrmann <trohrm...@apache.org> > > > wrote: > > > > > >> Thanks for creating this FLIP Andrey. > > >> > > >> I agree with Xintong that we should rename > jobmanager.memory.direct.size > > >> into jobmanager.memory.off-heap.size which accounts for native and > > direct > > >> memory usage. I think it should be good enough and is easier to > > understand > > >> for the user. > > >> > > >> Concerning the default value for the metaspace size. Did we take the > > >> lessons learned from the TM metaspace size into account? IIRC we are > > about > > >> to change the default value to 256 MB. > > >> > > >> Feel free to start a vote once these last two questions have been > > >> resolved. > > >> > > >> Cheers, > > >> Till > > >> > > >> On Thu, Mar 12, 2020 at 4:25 AM Xintong Song <tonysong...@gmail.com> > > >> wrote: > > >> > > >> > Thanks Andrey for kicking this discussion off. > > >> > > > >> > Regarding "direct" vs. "off-heap", I'm personally in favor of > renaming > > >> the > > >> > "direct" memory in the current FLIP-116[1] to "off-heap" memory, and > > >> making > > >> > it also account for user native memory usage. > > >> > > > >> > On one hand, I think it would be good that JM & TM provide > consistent > > >> > concepts and terminologies to users. IIUC, this is exactly the > purpose > > >> of > > >> > this FLIP. For TMs, we already have "off-heap" memory accounting for > > >> both > > >> > direct and native memory usages, and we did this so that users do > not > > >> need > > >> > to understand the differences between the two kinds. > > >> > > > >> > On the other hand, while for TMs it is hard to tell which kind of > > >> memory is > > >> > needed mostly due to variety of applications, I believe for JM the > > major > > >> > memory consumption is heap memory in most cases. That means we > > probably > > >> can > > >> > rely on the heap activities to trigger GC in most cases, and the max > > >> direct > > >> > memory limit can act as a safe net. Moreover, I think the cases > should > > >> be > > >> > very rare that we need native memory for user codes. Therefore, we > > >> probably > > >> > should not break the JM/TM consistency for potential risks in such > > rare > > >> > cases. > > >> > > > >> > WDYT? > > >> > > > >> > Thank you~ > > >> > > > >> > Xintong Song > > >> > > > >> > > > >> > [1] > > >> > > > >> > > > >> > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP+116%3A+Unified+Memory+Configuration+for+Job+Managers > > >> > > > >> > On Wed, Mar 11, 2020 at 8:56 PM Andrey Zagrebin < > azagre...@apache.org > > > > > >> > wrote: > > >> > > > >> > > Hi All, > > >> > > > > >> > > As you may have noticed, 1.10 release included an extensive > > >> improvements > > >> > to > > >> > > memory management and configuration of Task Managers, FLIP-49: > [1]. > > >> The > > >> > > memory configuration of Job Managers has not been touched in 1.10. > > >> > > > > >> > > Although, Job Manager's memory model does not look so > sophisticated > > as > > >> > > for Task Managers, It makes to align Job Manager memory model and > > >> > settings > > >> > > with Task Managers. Therefore, we propose to reconsider it as well > > in > > >> > 1.11 > > >> > > and I prepared a FLIP 116 [2] for that. > > >> > > > > >> > > Any feedback is appreciated. > > >> > > > > >> > > So far, there is one discussion point about how to address native > > >> > > non-direct memory usage of user code. The user code can be run > e.g. > > in > > >> > > certain job submission scenarios within the JM process. For > > >> simplicity, > > >> > > FLIP suggests only an option for direct memory which is translated > > >> into > > >> > the > > >> > > setting of the JVM direct memory limit. > > >> > > Although, we documented for TM that the similar parameters can > also > > >> > > address native non-direct memory usage [3], this can lead to wrong > > >> > > functioning of the JVM direct memory limit. The direct memory > option > > >> in > > >> > JM > > >> > > could be also named in more general way, e.g. off-heap memory but > > this > > >> > > naming would somewhat hide its nature of JVM direct memory limit. > > >> > > On the other hand, JVM Overhead does not suffer from this problem > > and > > >> > > affects only the container/worker memory size which is the most > > >> important > > >> > > matter to address for the native non-direct memory consumption. > The > > >> > caveat > > >> > > here is that JVM Overhead was not supposed to be used by any Flink > > or > > >> > user > > >> > > components. > > >> > > > > >> > > Thanks, > > >> > > Andrey > > >> > > > > >> > > [1] > > >> > > > > >> > > > > >> > > > >> > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-49%3A+Unified+Memory+Configuration+for+TaskExecutors > > >> > > [2] > > >> > > > > >> > > > > >> > > > >> > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP+116%3A+Unified+Memory+Configuration+for+Job+Managers > > >> > > [3] > > >> > > > > >> > > > > >> > > > >> > > > https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/memory/mem_detail.html#overview > > >> > > > > >> > > > >> > > > > > >