Hi Yadong, thanks for creating this FLIP. I like the idea of exposing more cluster information to the user.
I share Xintong's concerns that we are about to rework the cluster entrypoint's memory management. It might make sense to wait for these changes before starting this effort. Otherwise, we might risk to do some double work. Concerning FLINK-9741, I'm not sure whether we need to fix this issue before starting this effort. The JobManager's are now running as part of the cluster entrypoint process for which we should actually report the metrics (memory usage). Cheers, Till On Tue, Feb 25, 2020 at 10:52 AM Jark Wu <imj...@gmail.com> wrote: > Thanks Xintong for the explanation. > > The FLIP looks good to me now. +1 from my side. > > Best, > Jark > > On Tue, 25 Feb 2020 at 15:46, Xintong Song <tonysong...@gmail.com> wrote: > > > @Jark > > > > First, let me try to clarify that, while this FLIP is about adding JM > > metrics, the discussion of having different colors distinguishing the > > memory usage applies for both JM and TM. > > > > IMO, I don't think there's a good way to define how should memory > > utilization be mapped to colors in general. > > > > - Direct memory > > - JM: ATM, we do not specify -XX:MaxDirectMemorySize. > > - TM: Direct memory consists of network memory and framework/task > > off-heap memory, the former should always be 100% while the latter > may not. > > Therefore, the utilization of direct memory really depends on the > > configured size of network memory and framework/task off-heap > memory. > > - Heap memory: We might observe that the memory usage keeps growing > > until GC is triggered, thus eventually the utilization might > fluctuates at > > somewhere close to 100%. > > > > In general, a low memory utilization probably suggests that the memory > > size is configured too large, but a high memory utilization does not > > necessarily suggest the configured memory size need to be increased, > thus, > > not sure about rendering it in red. > > > > > > Thank you~ > > > > Xintong Song > > > > > > > > On Tue, Feb 25, 2020 at 3:13 PM Yadong Xie <vthink...@gmail.com> wrote: > > > >> Hi all > >> we have updated the POC web, and added unit to GC metrics > >> check it here http://101.132.122.69:8081/web/#/job-manager/metrics > >> thanks for all the response > >> > >> Jark Wu <imj...@gmail.com> 于2020年2月24日周一 下午8:48写道: > >> > >>> Hi Yadong, > >>> > >>> > what is the boundary between red and green? > >>> Yes. I think that's the point we need to discuss. My gut feeling is > >>> "<60%" > >>> => green, "60%~80%" => yellow, ">80%" => red. > >>> But I guess directed memory is always 100%, so it is not suitable for > >>> that? > >>> Maybe @Xintong Song <tonysong...@gmail.com> has a better understanding > >>> on > >>> the memory threshold. > >>> > >>> Best, > >>> Jark > >>> > >>> On Mon, 24 Feb 2020 at 15:41, Yadong Xie <vthink...@gmail.com> wrote: > >>> > >>> > Hi Jark > >>> > thanks for your suggestion > >>> > > >>> > > I think we can use different color to distinguish the memory usage > >>> (from > >>> > green to red?). > >>> > > >>> > It is a good idea, but what is the boundary between red and green? > >>> giving a > >>> > magic number boundary may mislead the users. any suggestions? > >>> > > >>> > > Besides, I think we should add an unit on the "Garbage Collection" > -> > >>> > "Time", it's hard to know what the value mean. Would be better to > >>> display > >>> > the value like "10ms", "5ns". > >>> > > >>> > I will add the unit later, thanks for your advice. > >>> > > >>> > > >>> > Xintong Song <tonysong...@gmail.com> 于2020年2月21日周五 下午6:02写道: > >>> > > >>> > > FYI, there's an effort planned for 1.11 to improve the memory > >>> > configuration > >>> > > of the Flink master process, similar to FLIP-49 but definitely less > >>> > > complexity. > >>> > > > >>> > > I would not consider the memory configuration improvement as a > >>> blocker > >>> > for > >>> > > this effort. As far as I can see, there's nothing in conflict. Just > >>> after > >>> > > the memory configuration improvement, we might be able to present > >>> more > >>> > > information on the JM metrics page, which are tightly corresponding > >>> to > >>> > the > >>> > > configuration options, like what we planned for the TM metrics page > >>> in > >>> > > FLIP-102. Therefore, it might make sense to proceed this FLIP > >>> afterwards. > >>> > > > >>> > > I'm neutral on this, and would leave the call to Yandong and > Lining. > >>> > > > >>> > > Thank you~ > >>> > > > >>> > > Xintong Song > >>> > > > >>> > > > >>> > > > >>> > > On Fri, Feb 21, 2020 at 2:47 PM Jark Wu <imj...@gmail.com> wrote: > >>> > > > >>> > > > Thanks Yadong, > >>> > > > > >>> > > > I think we can use different color to distinguish the memory > usage > >>> > (from > >>> > > > green to red?). > >>> > > > Besides, I think we should add an unit on the "Garbage > Collection" > >>> -> > >>> > > > "Time", it's hard to know what the value mean. > >>> > > > Would be better to display the value like "10ms", "5ns". > >>> > > > > >>> > > > Best, > >>> > > > Jark > >>> > > > > >>> > > > On Thu, 20 Feb 2020 at 17:58, Yadong Xie <vthink...@gmail.com> > >>> wrote: > >>> > > > > >>> > > > > Hi all > >>> > > > > > >>> > > > > I want to start the vote for FLIP-104, which proposes to add > more > >>> > > metrics > >>> > > > > to job manager. > >>> > > > > > >>> > > > > To help everyone better understand the proposal, we spent some > >>> > efforts > >>> > > on > >>> > > > > making an online POC > >>> > > > > > >>> > > > > previous web: http://101.132.122.69:8081/#/job-manager/config > >>> > > > > POC web: http://101.132.122.69:8081/web/#/job-manager/metrics > >>> > > > > > >>> > > > > > >>> > > > > The vote will last for at least 72 hours, following the > consensus > >>> > > voting > >>> > > > > process. > >>> > > > > > >>> > > > > FLIP wiki: > >>> > > > > > >>> > > > > > >>> > > > > >>> > > > >>> > > >>> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-104%3A+Add+More+Metrics+to+Jobmanager > >>> > > > > > >>> > > > > Discussion thread: > >>> > > > > > >>> > > > > > >>> > > > > >>> > > > >>> > > >>> > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-75-Flink-Web-UI-Improvement-Proposal-td33540.html > >>> > > > > > >>> > > > > Thanks, > >>> > > > > > >>> > > > > Yadong > >>> > > > > > >>> > > > > >>> > > > >>> > > >>> > >> >