argh, i think the screenshot is missing (at least nabble is not showing anything). here is a link to the mockup:
https://drive.google.com/file/d/1p3wVP028_AFFLZ6fjPb41yAI8zUhgDTO/view?usp=sharing Cheers -- *Fabian WollertZalando SE* E-Mail: fab...@zalando.de Am Di., 9. Okt. 2018 um 12:46 Uhr schrieb Fabian Wollert <fab...@zalando.de >: > Hi everyone, > > disclaimer: i read the contribution guide about improvement requests (i.e. > i should actually just start a jira ticket) but i thought it would make > sense to run this first through the mailing list here. after collecting > some input i would then create the jira ticket. > > When accessing the Flink Web Dashboard (which is basically what i do > almost every day to check some status of a job or so), I recently felt that > the actual information given in the top portion of the start page is highly > improvable. I created a first mock by moving html elements around and > wanted to share this one now: > > [image: image.png] > > With the exception of the metrics (see below) none of this information > should be new, but rather re-organized to speed up investigation and > monitoring: > > - complete overview on the cluster status and health, without clicking > through a lot of pages. > - Active and stand-by Job Managers. Also their health is depicted as a > color (as a first suggestion: last heartbeat is inside > heartbeat.timeout) > - Current registered Task Managers > - the little bar on the side indicates task slot usage. i did > not color it since a fully utilised task manager is not necessarily > something bad. > - the color indicates the health of the task manager (as a first > suggestion: last heartbeat is inside heartbeat.timeout) > - overview on some cluster metrics > > Some points to notice: > > - All data you see on the screenshot is mock, no number relates to > another number at all. but colors should relate to the numbers already > which they indicate. > - All of this could also be done with other monitoring solutions > someone might have in his company, by reading out JMX metrics and then > plotting those in his monitoring solution (e.g. grafana). But this out of > the box solution would save everyone from doing it on their own and they > could trust the metrics shown here. > - Some of the metrics can only be done with FLINK-7286 > <https://issues.apache.org/jira/browse/FLINK-7286> being done. So i > would split the implementation of this into two parts (cluster overview and > metrics) and do them separately. > - This first mock up is targeted to what we here at Zalando would like > to see first glance, so it fits our use case very well. We mostly use > long-running session clusters. > - I'm more a Backend Guy with some Frontend expertise (but mostly in > React, no angular1 (Flink Web Dashboard is built with this currently) > experience) and not at all a designer. > > What do you think? I would be glad to have some feedback on this, > especially if this makes sense in the broad community. I would no matter > what implement this somehow, if not in the Flink Master branch, then as a > OS project which anyone can deploy next to their flink clusters. But i > first wanted to run it through here to see if this sparks any interest. > > Please also let me know if you see difficulties implementing this already, > maybe i have overseen something. > > Can't wait for your input. > > Cheers > > -- > > > *Fabian WollertZalando SE* > > E-Mail: fab...@zalando.de >