I agree with Xintong's proposal. If we see that many users run into this
problem, then one could think about escalating the warning message into a
failure.

Cheers,
Till

On Thu, Mar 19, 2020 at 4:23 AM Xintong Song <tonysong...@gmail.com> wrote:

> I think recommend a minimum value in docs and throw a warning if the heap
> size is too small should be good enough.
> Not sure about failing job if the min heap is not fulfilled. As already
> mentioned, it would be hard to determine the min heap size. And if we make
> the min heap configurable, then in any case that users need to configure
> the min heap, they can configure the heap size directly.
>
> Thank you~
>
> Xintong Song
>
>
>
> On Wed, Mar 18, 2020 at 10:55 PM Andrey Zagrebin <azagre...@apache.org>
> wrote:
>
> > Hi all,
> >
> > One thing more thing to mention, the current calculations can lead to
> > arbitrary small JVM Heap, maybe even zero.
> > I suggest to introduce a check where we at least recommend to set the JVM
> > heap to e.g. 128Mb.
> >
> > Additionally, we can demand some minimum value to function and fail if it
> > is not fulfilled.
> > We could experiment with what is the working minimum but It is hard to
> come
> > up with this limit because it again can depend on the job and
> environment.
> >
> > Best,
> > Andrey
> >
> > On Wed, Mar 18, 2020 at 5:03 PM Andrey Zagrebin <azagre...@apache.org>
> > wrote:
> >
> > > Hi all,
> > >
> > > Thanks for the feedback, Xintong and Till.
> > >
> > > > rename jobmanager.memory.direct.size into
> > jobmanager.memory.off-heap.size
> > >
> > > I am ok with that to align it with TM and avoid further complications
> for
> > > users.
> > > I will adjust the FLIP.
> > >
> > > > change the default value of JM Metaspace size to 256 MB
> > >
> > > Indeed, no reason to assume that the user code would need less
> Metaspace
> > > in JM.
> > > I will change it unless a better argument is reported for another
> value.
> > >
> > > I think all concerns has been resolved so I am starting the voting in a
> > > separate thread.
> > >
> > > Best,
> > > Andrey
> > >
> > > On Tue, Mar 17, 2020 at 6:16 PM Till Rohrmann <trohrm...@apache.org>
> > > wrote:
> > >
> > >> Thanks for creating this FLIP Andrey.
> > >>
> > >> I agree with Xintong that we should rename
> jobmanager.memory.direct.size
> > >> into jobmanager.memory.off-heap.size which accounts for native and
> > direct
> > >> memory usage. I think it should be good enough and is easier to
> > understand
> > >> for the user.
> > >>
> > >> Concerning the default value for the metaspace size. Did we take the
> > >> lessons learned from the TM metaspace size into account? IIRC we are
> > about
> > >> to change the default value to 256 MB.
> > >>
> > >> Feel free to start a vote once these last two questions have been
> > >> resolved.
> > >>
> > >> Cheers,
> > >> Till
> > >>
> > >> On Thu, Mar 12, 2020 at 4:25 AM Xintong Song <tonysong...@gmail.com>
> > >> wrote:
> > >>
> > >> > Thanks Andrey for kicking this discussion off.
> > >> >
> > >> > Regarding "direct" vs. "off-heap", I'm personally in favor of
> renaming
> > >> the
> > >> > "direct" memory in the current FLIP-116[1] to "off-heap" memory, and
> > >> making
> > >> > it also account for user native memory usage.
> > >> >
> > >> > On one hand, I think it would be good that JM & TM provide
> consistent
> > >> > concepts and terminologies to users. IIUC, this is exactly the
> purpose
> > >> of
> > >> > this FLIP. For TMs, we already have "off-heap" memory accounting for
> > >> both
> > >> > direct and native memory usages, and we did this so that users do
> not
> > >> need
> > >> > to understand the differences between the two kinds.
> > >> >
> > >> > On the other hand, while for TMs it is hard to tell which kind of
> > >> memory is
> > >> > needed mostly due to variety of applications, I believe for JM the
> > major
> > >> > memory consumption is heap memory in most cases. That means we
> > probably
> > >> can
> > >> > rely on the heap activities to trigger GC in most cases, and the max
> > >> direct
> > >> > memory limit can act as a safe net. Moreover, I think the cases
> should
> > >> be
> > >> > very rare that we need native memory for user codes. Therefore, we
> > >> probably
> > >> > should not break the JM/TM consistency for potential risks in such
> > rare
> > >> > cases.
> > >> >
> > >> > WDYT?
> > >> >
> > >> > Thank you~
> > >> >
> > >> > Xintong Song
> > >> >
> > >> >
> > >> > [1]
> > >> >
> > >> >
> > >>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP+116%3A+Unified+Memory+Configuration+for+Job+Managers
> > >> >
> > >> > On Wed, Mar 11, 2020 at 8:56 PM Andrey Zagrebin <
> azagre...@apache.org
> > >
> > >> > wrote:
> > >> >
> > >> > > Hi All,
> > >> > >
> > >> > > As you may have noticed, 1.10 release included an extensive
> > >> improvements
> > >> > to
> > >> > > memory management and configuration of Task Managers, FLIP-49:
> [1].
> > >> The
> > >> > > memory configuration of Job Managers has not been touched in 1.10.
> > >> > >
> > >> > > Although, Job Manager's memory model does not look so
> sophisticated
> > as
> > >> > > for Task Managers, It makes to align Job Manager memory model and
> > >> > settings
> > >> > > with Task Managers. Therefore, we propose to reconsider it as well
> > in
> > >> > 1.11
> > >> > > and I prepared a FLIP 116 [2] for that.
> > >> > >
> > >> > > Any feedback is appreciated.
> > >> > >
> > >> > > So far, there is one discussion point about how to address native
> > >> > > non-direct memory usage of user code. The user code can be run
> e.g.
> > in
> > >> > > certain job submission scenarios within the JM process. For
> > >> simplicity,
> > >> > > FLIP suggests only an option for direct memory which is translated
> > >> into
> > >> > the
> > >> > > setting of the JVM direct memory limit.
> > >> > > Although, we documented for TM that the similar parameters can
> also
> > >> > > address native non-direct memory usage [3], this can lead to wrong
> > >> > > functioning of the JVM direct memory limit. The direct memory
> option
> > >> in
> > >> > JM
> > >> > > could be also named in more general way, e.g. off-heap memory but
> > this
> > >> > > naming would somewhat hide its nature of JVM direct memory limit.
> > >> > > On the other hand, JVM Overhead does not suffer from this problem
> > and
> > >> > > affects only the container/worker memory size which is the most
> > >> important
> > >> > > matter to address for the native non-direct memory consumption.
> The
> > >> > caveat
> > >> > > here is that JVM Overhead was not supposed to be used by any Flink
> > or
> > >> > user
> > >> > > components.
> > >> > >
> > >> > > Thanks,
> > >> > > Andrey
> > >> > >
> > >> > > [1]
> > >> > >
> > >> > >
> > >> >
> > >>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-49%3A+Unified+Memory+Configuration+for+TaskExecutors
> > >> > > [2]
> > >> > >
> > >> > >
> > >> >
> > >>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP+116%3A+Unified+Memory+Configuration+for+Job+Managers
> > >> > > [3]
> > >> > >
> > >> > >
> > >> >
> > >>
> >
> https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/memory/mem_detail.html#overview
> > >> > >
> > >> >
> > >>
> > >
> >
>

Reply via email to