Hi Till Thanks for your response! I'm responsible for the RestAPI design part of FLIP-102
1. Some of the newly introduced metrics involve computations on the > TaskManager. I would like to avoid additional computations introduced by > metrics as much as possible because metrics should not affect the system. > In particular, total memory sizes which are configured should not be > derived computationally (getManagedMemoryTotal, getTotalMemorySize). For > the currently available memory sizes (e.g. getManagedMemoryUsed), one could > think about reporting them on a per slot basis and to do the aggregation on > the client side. Of course, this would increase the size of the response > payload. I totally agree with your comment, but I still have a question: where should the metric of slot's ManagedMemory be registered? There are two ways to achieve this: 1. add SlotMetricGroup 2. register it in TaskManagerMetricGroup, such as 0.Managed.Memory.Used (ps: 0 as the index of a slot). Which way do you think is better? Looking forward to your replay. Till Rohrmann <trohrm...@apache.org> 于2020年2月25日周二 下午6:58写道: > Thanks for creating this FLIP Yadong. I think your proposal makes it much > easier for the user to understand what's happening on Flink TaskManager's. > > I have some comments: > > 1. Some of the newly introduced metrics involve computations on the > TaskManager. I would like to avoid additional computations introduced by > metrics as much as possible because metrics should not affect the system. > In particular, total memory sizes which are configured should not be > derived computationally (getManagedMemoryTotal, getTotalMemorySize). For > the currently available memory sizes (e.g. getManagedMemoryUsed), one could > think about reporting them on a per slot basis and to do the aggregation on > the client side. Of course, this would increase the size of the response > payload. > > 2. I'm not entirely sure whether I would split the memory display into JVM > memory and non JVM memory as you've done it int the POC. From a user's > perspective, one could start displaying the total process memory. The next > three most important metrics are the heap, managed memory and network > buffer usage, I guess. If one is interested in more details, one could then > display the remaining direct memory usage, the JVM overhead (I'm not sure > whether I would call this non-heap though) and the mapped memory. > > 3. Displaying the memory configurations in three nested boxes does not look > so nice to me. I'm not sure how else one could display it, though. > > 4. What does JVM limit mean in Non-heap.JVM-Overhead? > > Cheers, > Till > > On Tue, Feb 25, 2020 at 8:19 AM Yadong Xie <vthink...@gmail.com> wrote: > > > Hi Xintong > > thanks for your advice, the POC web and the FLIP doc was updated now > > here is the new link: > > > > > http://101.132.122.69:8081/web/#/task-manager/7e7cf0293645c8537caab915c829aa73/metrics > > > > > > Xintong Song <tonysong...@gmail.com> 于2020年2月21日周五 下午12:00写道: > > > > > > > > > > 1. Should the managed memory be part of direct memory? > > > > > > > The answer is no. Managed memory is currently allocated by accessing to > > > private field of Unsafe. It is not accounted for in JVM's direct memory > > > limit and corresponding metrics. To that end, it is equivalent to > > > native memory. > > > > > > > > > > 2. Should the shuffle memory also be part of the managed memory? > > > > > > I don't think so. Shuffle (Network) memory is allocated with direct > > > buffers, and accounted for in JVM's direct memory limit and > corresponding > > > metrics. Moreover, the FLIP-49 memory model expose network memory and > > > managed memory as two independent components of the overall memory > > > footprint. > > > > > > > > > Thank you~ > > > > > > Xintong Song > > > > > > > > > > > > On Fri, Feb 21, 2020 at 11:45 AM Kurt Young <ykt...@gmail.com> wrote: > > > > > > > Some questions related to "managed memory": > > > > > > > > 1. Should the managed memory be part of direct memory? > > > > 2. Should the shuffle memory also be part of the managed memory? > > > > > > > > Best, > > > > Kurt > > > > > > > > > > > > On Fri, Feb 21, 2020 at 10:41 AM Xintong Song <tonysong...@gmail.com > > > > > > wrote: > > > > > > > > > Thanks for driving this FLIP, Yadong. > > > > > > > > > > +1 (non-binding) for the FLIP in general. I think this really helps > > our > > > > > users to understand and use the new FLIP-49 memory configuration. > > > > > > > > > > I have a few minor comments. > > > > > - There's a frame "Other" in the frame "Non-Heap", besides "JVM > > > Overhead" > > > > > and "JVM Metaspace". IIUC, the purpose of this is to explain the > > > > > mismatching between the metric "non-heap maximum" and the sum of > the > > > > > configurations "JVM metaspace" & "JVM Overhead". However, from the > > > > > perspective of FLIP-49, JVM Overhead accounts for all the JVM > > non-heap > > > > > memory usages except for metaspace. The metrics does not match the > > > > > configuration because we did not set the a JVM parameter for "max > > > > non-heap > > > > > memory" (actually I'm not sure whether it can be specified in java > > 8). > > > > The > > > > > current UI might confuse people making them think there are other > > > > non-heap > > > > > memory usages not accounted by the configurations. Therefore, I > would > > > > > suggest to remove the "Other" frame, but add another frame inside > > "JVM > > > > > Overhead", besides "Configuration", with "JVM limit" as the title > and > > > > > "non-heap max metric minus metaspace configuration" as the value . > > > > > > > > > > - In the final release, we have changed "shuffle memory" to > "network > > > > > memory" because the latter is easier to understand for users. I > think > > > we > > > > > should be updated it in this FLIP as well. > > > > > > > > > > - There's a typo "Directed" (should be "Direct") at the direct > memory > > > > > metric. > > > > > > > > > > Thank you~ > > > > > > > > > > Xintong Song > > > > > > > > > > > > > > > > > > > > On Thu, Feb 20, 2020 at 5:52 PM Yadong Xie <vthink...@gmail.com> > > > wrote: > > > > > > > > > > > Hi all > > > > > > > > > > > > I want to start the vote for FLIP-102, which proposes to add more > > > > metrics > > > > > > to the task manager in web UI. > > > > > > > > > > > > To help everyone better understand the proposal, we spent some > > > efforts > > > > on > > > > > > making an online POC > > > > > > > > > > > > previous web: > > > > > > > > > > > > > > > > > > > > > > > > > > > http://101.132.122.69:8081/#/task-manager/6df6c5f37b2bff125dbc3a7388128559/metrics > > > > > > POC web: > > > > > > > > > > > > > > > > > > > > > > > > > > > http://101.132.122.69:8081/web/#/task-manager/6df6c5f37b2bff125dbc3a7388128559/metrics > > > > > > > > > > > > > > > > > > The vote will last for at least 72 hours, following the consensus > > > > voting > > > > > > process. > > > > > > > > > > > > FLIP wiki: > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-102%3A+Add+More+Metrics+to+TaskManager > > > > > > > > > > > > Discussion thread: > > > > > > > > > > > > > > > > > > > > > > > > > > > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-75-Flink-Web-UI-Improvement-Proposal-td33540.html > > > > > > > > > > > > Thanks, > > > > > > > > > > > > Yadong > > > > > > > > > > > > > > > > > > > > >