Hi all

I have updated the design of the metric page and FLIP doc, please let me
know what you think about it

FLIP-102:
https://cwiki.apache.org/confluence/display/FLINK/FLIP-102%3A+Add+More+Metrics+to+TaskManager
POC web:
http://101.132.122.69:8081/web/#/task-manager/8e1f1beada3859ee8e46d0960bb1da18/metrics

Till Rohrmann <trohrm...@apache.org> 于2020年2月27日周四 下午10:27写道:

> Thinking a bit more about the problem whether to report the aggregated
> memory statistics or the individual slot statistics, I think reporting it
> on a per slot basis won't work nicely together with FLIP-56 (dynamic slot
> allocation). The problem is that with FLIP-56, we will no longer have
> dedicated slots. The number of slots might change over the lifetime of a
> TaskExecutor. Hence, it won't be easy to generate a metric path for every
> slot which are furthermore also ephemeral. So maybe, the more general and
> easier solution would be to report the overall memory usage of a
> TaskExecutor even though it means to do some aggregation on the
> TaskExecutor.
>
> Concerning the JVM limit: Isn't it mainly the code cache? If we display
> this value, then we should explain what exactly it means. I fear that most
> users won't understand what JVM limit actually means.
>
> Cheers,
> Till
>
> On Wed, Feb 26, 2020 at 11:15 AM Yadong Xie <vthink...@gmail.com> wrote:
>
> > Hi Till
> >
> > Thanks a lot for your response
> >
> > > 2. I'm not entirely sure whether I would split the memory ...
> >
> > Split the memory display comes from the 'ancient' design of the web, it
> is
> > ok for me to change it following total/heap/managed/network/direct/jvm
> > overhead/mapped sequence
> >
> > > 3. Displaying the memory configurations...
> >
> > I agree with you that it is not a very nice way, but the hierarchical
> > relationship of configurations is too complex and hard to display in the
> > other ways (I have tried)
> >
> > if anyone has a better idea, please feels no hesitates to help me
> >
> >
> > > 4. What does JVM limit mean in Non-heap.JVM-Overhead?
> >
> > JVM limit is "non-heap max metric minus metaspace configuration" as
> > @Xintong
> > Song <tonysong...@gmail.com> replyed in this mail thread
> >
> >
> > Till Rohrmann <trohrm...@apache.org> 于2020年2月25日周二 下午6:58写道:
> >
> > > Thanks for creating this FLIP Yadong. I think your proposal makes it
> much
> > > easier for the user to understand what's happening on Flink
> > TaskManager's.
> > >
> > > I have some comments:
> > >
> > > 1. Some of the newly introduced metrics involve computations on the
> > > TaskManager. I would like to avoid additional computations introduced
> by
> > > metrics as much as possible because metrics should not affect the
> system.
> > > In particular, total memory sizes which are configured should not be
> > > derived computationally (getManagedMemoryTotal, getTotalMemorySize).
> For
> > > the currently available memory sizes (e.g. getManagedMemoryUsed), one
> > could
> > > think about reporting them on a per slot basis and to do the
> aggregation
> > on
> > > the client side. Of course, this would increase the size of the
> response
> > > payload.
> > >
> > > 2. I'm not entirely sure whether I would split the memory display into
> > JVM
> > > memory and non JVM memory as you've done it int the POC. From a user's
> > > perspective, one could start displaying the total process memory. The
> > next
> > > three most important metrics are the heap, managed memory and network
> > > buffer usage, I guess. If one is interested in more details, one could
> > then
> > > display the remaining direct memory usage, the JVM overhead (I'm not
> sure
> > > whether I would call this non-heap though) and the mapped memory.
> > >
> > > 3. Displaying the memory configurations in three nested boxes does not
> > look
> > > so nice to me. I'm not sure how else one could display it, though.
> > >
> > > 4. What does JVM limit mean in Non-heap.JVM-Overhead?
> > >
> > > Cheers,
> > > Till
> > >
> > > On Tue, Feb 25, 2020 at 8:19 AM Yadong Xie <vthink...@gmail.com>
> wrote:
> > >
> > > > Hi Xintong
> > > > thanks for your advice, the POC web and the FLIP doc was updated now
> > > > here is the new link:
> > > >
> > > >
> > >
> >
> http://101.132.122.69:8081/web/#/task-manager/7e7cf0293645c8537caab915c829aa73/metrics
> > > >
> > > >
> > > > Xintong Song <tonysong...@gmail.com> 于2020年2月21日周五 下午12:00写道:
> > > >
> > > > > >
> > > > > > 1. Should the managed memory be part of direct memory?
> > > > > >
> > > > > The answer is no. Managed memory is currently allocated by
> accessing
> > to
> > > > > private field of Unsafe. It is not accounted for in JVM's direct
> > memory
> > > > > limit and corresponding metrics. To that end, it is equivalent to
> > > > > native memory.
> > > > >
> > > > >
> > > > > > 2. Should the shuffle memory also be part of the managed memory?
> > > > >
> > > > > I don't think so. Shuffle (Network) memory is allocated with direct
> > > > > buffers, and accounted for in JVM's direct memory limit and
> > > corresponding
> > > > > metrics. Moreover, the FLIP-49 memory model expose network memory
> and
> > > > > managed memory as two independent components of the overall memory
> > > > > footprint.
> > > > >
> > > > >
> > > > > Thank you~
> > > > >
> > > > > Xintong Song
> > > > >
> > > > >
> > > > >
> > > > > On Fri, Feb 21, 2020 at 11:45 AM Kurt Young <ykt...@gmail.com>
> > wrote:
> > > > >
> > > > > > Some questions related to "managed memory":
> > > > > >
> > > > > > 1. Should the managed memory be part of direct memory?
> > > > > > 2. Should the shuffle memory also be part of the managed memory?
> > > > > >
> > > > > > Best,
> > > > > > Kurt
> > > > > >
> > > > > >
> > > > > > On Fri, Feb 21, 2020 at 10:41 AM Xintong Song <
> > tonysong...@gmail.com
> > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Thanks for driving this FLIP, Yadong.
> > > > > > >
> > > > > > > +1 (non-binding) for the FLIP in general. I think this really
> > helps
> > > > our
> > > > > > > users to understand and use the new FLIP-49 memory
> configuration.
> > > > > > >
> > > > > > > I have a few minor comments.
> > > > > > > - There's a frame "Other" in the frame "Non-Heap", besides "JVM
> > > > > Overhead"
> > > > > > > and "JVM Metaspace". IIUC, the purpose of this is to explain
> the
> > > > > > > mismatching between the metric "non-heap maximum" and the sum
> of
> > > the
> > > > > > > configurations "JVM metaspace" & "JVM Overhead". However, from
> > the
> > > > > > > perspective of FLIP-49, JVM Overhead accounts for all the JVM
> > > > non-heap
> > > > > > > memory usages except for metaspace. The metrics does not match
> > the
> > > > > > > configuration because we did not set the a JVM parameter for
> "max
> > > > > > non-heap
> > > > > > > memory" (actually I'm not sure whether it can be specified in
> > java
> > > > 8).
> > > > > > The
> > > > > > > current UI might confuse people making them think there are
> other
> > > > > > non-heap
> > > > > > > memory usages not accounted by the configurations. Therefore, I
> > > would
> > > > > > > suggest to remove the "Other" frame, but add another frame
> inside
> > > > "JVM
> > > > > > > Overhead", besides "Configuration", with "JVM limit" as the
> title
> > > and
> > > > > > > "non-heap max metric minus metaspace configuration" as the
> value
> > .
> > > > > > >
> > > > > > > - In the final release, we have changed "shuffle memory" to
> > > "network
> > > > > > > memory" because the latter is easier to understand for users. I
> > > think
> > > > > we
> > > > > > > should be updated it in this FLIP as well.
> > > > > > >
> > > > > > > - There's a typo "Directed" (should be "Direct") at the direct
> > > memory
> > > > > > > metric.
> > > > > > >
> > > > > > > Thank you~
> > > > > > >
> > > > > > > Xintong Song
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Thu, Feb 20, 2020 at 5:52 PM Yadong Xie <
> vthink...@gmail.com>
> > > > > wrote:
> > > > > > >
> > > > > > > > Hi all
> > > > > > > >
> > > > > > > > I want to start the vote for FLIP-102, which proposes to add
> > more
> > > > > > metrics
> > > > > > > > to the task manager in web UI.
> > > > > > > >
> > > > > > > > To help everyone better understand the proposal, we spent
> some
> > > > > efforts
> > > > > > on
> > > > > > > > making an online POC
> > > > > > > >
> > > > > > > > previous web:
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://101.132.122.69:8081/#/task-manager/6df6c5f37b2bff125dbc3a7388128559/metrics
> > > > > > > > POC web:
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://101.132.122.69:8081/web/#/task-manager/6df6c5f37b2bff125dbc3a7388128559/metrics
> > > > > > > >
> > > > > > > >
> > > > > > > > The vote will last for at least 72 hours, following the
> > consensus
> > > > > > voting
> > > > > > > > process.
> > > > > > > >
> > > > > > > > FLIP wiki:
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-102%3A+Add+More+Metrics+to+TaskManager
> > > > > > > >
> > > > > > > > Discussion thread:
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-75-Flink-Web-UI-Improvement-Proposal-td33540.html
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > >
> > > > > > > > Yadong
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to