Thanks for creating this FLIP Yadong. I think your proposal makes it much
easier for the user to understand what's happening on Flink TaskManager's.

I have some comments:

1. Some of the newly introduced metrics involve computations on the
TaskManager. I would like to avoid additional computations introduced by
metrics as much as possible because metrics should not affect the system.
In particular, total memory sizes which are configured should not be
derived computationally (getManagedMemoryTotal, getTotalMemorySize). For
the currently available memory sizes (e.g. getManagedMemoryUsed), one could
think about reporting them on a per slot basis and to do the aggregation on
the client side. Of course, this would increase the size of the response
payload.

2. I'm not entirely sure whether I would split the memory display into JVM
memory and non JVM memory as you've done it int the POC. From a user's
perspective, one could start displaying the total process memory. The next
three most important metrics are the heap, managed memory and network
buffer usage, I guess. If one is interested in more details, one could then
display the remaining direct memory usage, the JVM overhead (I'm not sure
whether I would call this non-heap though) and the mapped memory.

3. Displaying the memory configurations in three nested boxes does not look
so nice to me. I'm not sure how else one could display it, though.

4. What does JVM limit mean in Non-heap.JVM-Overhead?

Cheers,
Till

On Tue, Feb 25, 2020 at 8:19 AM Yadong Xie <vthink...@gmail.com> wrote:

> Hi Xintong
> thanks for your advice, the POC web and the FLIP doc was updated now
> here is the new link:
>
> http://101.132.122.69:8081/web/#/task-manager/7e7cf0293645c8537caab915c829aa73/metrics
>
>
> Xintong Song <tonysong...@gmail.com> 于2020年2月21日周五 下午12:00写道:
>
> > >
> > > 1. Should the managed memory be part of direct memory?
> > >
> > The answer is no. Managed memory is currently allocated by accessing to
> > private field of Unsafe. It is not accounted for in JVM's direct memory
> > limit and corresponding metrics. To that end, it is equivalent to
> > native memory.
> >
> >
> > > 2. Should the shuffle memory also be part of the managed memory?
> >
> > I don't think so. Shuffle (Network) memory is allocated with direct
> > buffers, and accounted for in JVM's direct memory limit and corresponding
> > metrics. Moreover, the FLIP-49 memory model expose network memory and
> > managed memory as two independent components of the overall memory
> > footprint.
> >
> >
> > Thank you~
> >
> > Xintong Song
> >
> >
> >
> > On Fri, Feb 21, 2020 at 11:45 AM Kurt Young <ykt...@gmail.com> wrote:
> >
> > > Some questions related to "managed memory":
> > >
> > > 1. Should the managed memory be part of direct memory?
> > > 2. Should the shuffle memory also be part of the managed memory?
> > >
> > > Best,
> > > Kurt
> > >
> > >
> > > On Fri, Feb 21, 2020 at 10:41 AM Xintong Song <tonysong...@gmail.com>
> > > wrote:
> > >
> > > > Thanks for driving this FLIP, Yadong.
> > > >
> > > > +1 (non-binding) for the FLIP in general. I think this really helps
> our
> > > > users to understand and use the new FLIP-49 memory configuration.
> > > >
> > > > I have a few minor comments.
> > > > - There's a frame "Other" in the frame "Non-Heap", besides "JVM
> > Overhead"
> > > > and "JVM Metaspace". IIUC, the purpose of this is to explain the
> > > > mismatching between the metric "non-heap maximum" and the sum of the
> > > > configurations "JVM metaspace" & "JVM Overhead". However, from the
> > > > perspective of FLIP-49, JVM Overhead accounts for all the JVM
> non-heap
> > > > memory usages except for metaspace. The metrics does not match the
> > > > configuration because we did not set the a JVM parameter for "max
> > > non-heap
> > > > memory" (actually I'm not sure whether it can be specified in java
> 8).
> > > The
> > > > current UI might confuse people making them think there are other
> > > non-heap
> > > > memory usages not accounted by the configurations. Therefore, I would
> > > > suggest to remove the "Other" frame, but add another frame inside
> "JVM
> > > > Overhead", besides "Configuration", with "JVM limit" as the title and
> > > > "non-heap max metric minus metaspace configuration" as the value .
> > > >
> > > > - In the final release, we have changed "shuffle memory" to "network
> > > > memory" because the latter is easier to understand for users. I think
> > we
> > > > should be updated it in this FLIP as well.
> > > >
> > > > - There's a typo "Directed" (should be "Direct") at the direct memory
> > > > metric.
> > > >
> > > > Thank you~
> > > >
> > > > Xintong Song
> > > >
> > > >
> > > >
> > > > On Thu, Feb 20, 2020 at 5:52 PM Yadong Xie <vthink...@gmail.com>
> > wrote:
> > > >
> > > > > Hi all
> > > > >
> > > > > I want to start the vote for FLIP-102, which proposes to add more
> > > metrics
> > > > > to the task manager in web UI.
> > > > >
> > > > > To help everyone better understand the proposal, we spent some
> > efforts
> > > on
> > > > > making an online POC
> > > > >
> > > > > previous web:
> > > > >
> > > > >
> > > >
> > >
> >
> http://101.132.122.69:8081/#/task-manager/6df6c5f37b2bff125dbc3a7388128559/metrics
> > > > > POC web:
> > > > >
> > > > >
> > > >
> > >
> >
> http://101.132.122.69:8081/web/#/task-manager/6df6c5f37b2bff125dbc3a7388128559/metrics
> > > > >
> > > > >
> > > > > The vote will last for at least 72 hours, following the consensus
> > > voting
> > > > > process.
> > > > >
> > > > > FLIP wiki:
> > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-102%3A+Add+More+Metrics+to+TaskManager
> > > > >
> > > > > Discussion thread:
> > > > >
> > > > >
> > > >
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-75-Flink-Web-UI-Improvement-Proposal-td33540.html
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Yadong
> > > > >
> > > >
> > >
> >
>

Reply via email to