Hi till, thanks for your reply.
> Concerning FLINK-9741, I'm not sure whether we need to fix this issue > before starting this effort. The JobManager's are now running as part of > the cluster entrypoint process for which we should actually report the > metrics (memory usage). I have confirmed it with Zhu Zhu offline, as now dispatcher still with jobmanager, so it should not affect the accuracy of the metric. Till Rohrmann <trohrm...@apache.org> 于2020年2月26日周三 上午12:04写道: > Hi Yadong, > > thanks for creating this FLIP. I like the idea of exposing more > cluster information to the user. > > I share Xintong's concerns that we are about to rework the cluster > entrypoint's memory management. It might make sense to wait for these > changes before starting this effort. Otherwise, we might risk to do some > double work. > > Concerning FLINK-9741, I'm not sure whether we need to fix this issue > before starting this effort. The JobManager's are now running as part of > the cluster entrypoint process for which we should actually report the > metrics (memory usage). > > Cheers, > Till > > On Tue, Feb 25, 2020 at 10:52 AM Jark Wu <imj...@gmail.com> wrote: > > > Thanks Xintong for the explanation. > > > > The FLIP looks good to me now. +1 from my side. > > > > Best, > > Jark > > > > On Tue, 25 Feb 2020 at 15:46, Xintong Song <tonysong...@gmail.com> > wrote: > > > > > @Jark > > > > > > First, let me try to clarify that, while this FLIP is about adding JM > > > metrics, the discussion of having different colors distinguishing the > > > memory usage applies for both JM and TM. > > > > > > IMO, I don't think there's a good way to define how should memory > > > utilization be mapped to colors in general. > > > > > > - Direct memory > > > - JM: ATM, we do not specify -XX:MaxDirectMemorySize. > > > - TM: Direct memory consists of network memory and framework/task > > > off-heap memory, the former should always be 100% while the > latter > > may not. > > > Therefore, the utilization of direct memory really depends on the > > > configured size of network memory and framework/task off-heap > > memory. > > > - Heap memory: We might observe that the memory usage keeps growing > > > until GC is triggered, thus eventually the utilization might > > fluctuates at > > > somewhere close to 100%. > > > > > > In general, a low memory utilization probably suggests that the memory > > > size is configured too large, but a high memory utilization does not > > > necessarily suggest the configured memory size need to be increased, > > thus, > > > not sure about rendering it in red. > > > > > > > > > Thank you~ > > > > > > Xintong Song > > > > > > > > > > > > On Tue, Feb 25, 2020 at 3:13 PM Yadong Xie <vthink...@gmail.com> > wrote: > > > > > >> Hi all > > >> we have updated the POC web, and added unit to GC metrics > > >> check it here http://101.132.122.69:8081/web/#/job-manager/metrics > > >> thanks for all the response > > >> > > >> Jark Wu <imj...@gmail.com> 于2020年2月24日周一 下午8:48写道: > > >> > > >>> Hi Yadong, > > >>> > > >>> > what is the boundary between red and green? > > >>> Yes. I think that's the point we need to discuss. My gut feeling is > > >>> "<60%" > > >>> => green, "60%~80%" => yellow, ">80%" => red. > > >>> But I guess directed memory is always 100%, so it is not suitable for > > >>> that? > > >>> Maybe @Xintong Song <tonysong...@gmail.com> has a better > understanding > > >>> on > > >>> the memory threshold. > > >>> > > >>> Best, > > >>> Jark > > >>> > > >>> On Mon, 24 Feb 2020 at 15:41, Yadong Xie <vthink...@gmail.com> > wrote: > > >>> > > >>> > Hi Jark > > >>> > thanks for your suggestion > > >>> > > > >>> > > I think we can use different color to distinguish the memory > usage > > >>> (from > > >>> > green to red?). > > >>> > > > >>> > It is a good idea, but what is the boundary between red and green? > > >>> giving a > > >>> > magic number boundary may mislead the users. any suggestions? > > >>> > > > >>> > > Besides, I think we should add an unit on the "Garbage > Collection" > > -> > > >>> > "Time", it's hard to know what the value mean. Would be better to > > >>> display > > >>> > the value like "10ms", "5ns". > > >>> > > > >>> > I will add the unit later, thanks for your advice. > > >>> > > > >>> > > > >>> > Xintong Song <tonysong...@gmail.com> 于2020年2月21日周五 下午6:02写道: > > >>> > > > >>> > > FYI, there's an effort planned for 1.11 to improve the memory > > >>> > configuration > > >>> > > of the Flink master process, similar to FLIP-49 but definitely > less > > >>> > > complexity. > > >>> > > > > >>> > > I would not consider the memory configuration improvement as a > > >>> blocker > > >>> > for > > >>> > > this effort. As far as I can see, there's nothing in conflict. > Just > > >>> after > > >>> > > the memory configuration improvement, we might be able to present > > >>> more > > >>> > > information on the JM metrics page, which are tightly > corresponding > > >>> to > > >>> > the > > >>> > > configuration options, like what we planned for the TM metrics > page > > >>> in > > >>> > > FLIP-102. Therefore, it might make sense to proceed this FLIP > > >>> afterwards. > > >>> > > > > >>> > > I'm neutral on this, and would leave the call to Yandong and > > Lining. > > >>> > > > > >>> > > Thank you~ > > >>> > > > > >>> > > Xintong Song > > >>> > > > > >>> > > > > >>> > > > > >>> > > On Fri, Feb 21, 2020 at 2:47 PM Jark Wu <imj...@gmail.com> > wrote: > > >>> > > > > >>> > > > Thanks Yadong, > > >>> > > > > > >>> > > > I think we can use different color to distinguish the memory > > usage > > >>> > (from > > >>> > > > green to red?). > > >>> > > > Besides, I think we should add an unit on the "Garbage > > Collection" > > >>> -> > > >>> > > > "Time", it's hard to know what the value mean. > > >>> > > > Would be better to display the value like "10ms", "5ns". > > >>> > > > > > >>> > > > Best, > > >>> > > > Jark > > >>> > > > > > >>> > > > On Thu, 20 Feb 2020 at 17:58, Yadong Xie <vthink...@gmail.com> > > >>> wrote: > > >>> > > > > > >>> > > > > Hi all > > >>> > > > > > > >>> > > > > I want to start the vote for FLIP-104, which proposes to add > > more > > >>> > > metrics > > >>> > > > > to job manager. > > >>> > > > > > > >>> > > > > To help everyone better understand the proposal, we spent > some > > >>> > efforts > > >>> > > on > > >>> > > > > making an online POC > > >>> > > > > > > >>> > > > > previous web: > http://101.132.122.69:8081/#/job-manager/config > > >>> > > > > POC web: > http://101.132.122.69:8081/web/#/job-manager/metrics > > >>> > > > > > > >>> > > > > > > >>> > > > > The vote will last for at least 72 hours, following the > > consensus > > >>> > > voting > > >>> > > > > process. > > >>> > > > > > > >>> > > > > FLIP wiki: > > >>> > > > > > > >>> > > > > > > >>> > > > > > >>> > > > > >>> > > > >>> > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-104%3A+Add+More+Metrics+to+Jobmanager > > >>> > > > > > > >>> > > > > Discussion thread: > > >>> > > > > > > >>> > > > > > > >>> > > > > > >>> > > > > >>> > > > >>> > > > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-75-Flink-Web-UI-Improvement-Proposal-td33540.html > > >>> > > > > > > >>> > > > > Thanks, > > >>> > > > > > > >>> > > > > Yadong > > >>> > > > > > > >>> > > > > > >>> > > > > >>> > > > >>> > > >> > > >