Hi Xintong Song 2. We could switch between the detailed mode(including cpu, task heap, task off-heap, shuffle, on-heap managed, off-heap managed) and the summary mode(only including cpu and mem), which is very easy to do in UI design.
4. I think the key point is not pagination in Web UI but the REST API will totally *break* without pagination in current design mode. In my opinion, pagination is better than nothing, the pagination is a solution to keep log API work, and it would be great if there is another way to keep it work with huge log data. Xintong Song <tonysong...@gmail.com> 于2019年9月30日周一 下午7:19写道: > @Yadong > > 2. I agree that we can update the task executor ui after flip-56 is done. > But I would suggest keep it on discussion to come up with a proper ui > design for task executor resources. I don't think the mentioned image from > flip-56 is a good choice. That image is a simplified figure with cpu and > total memory only, for the purpose of demonstrating dynamic slot > allocation. In fact, there are 6 fields to be displayed (cpu, task heap, > task off-heap, shuffle, on-heap managed, off-heap managed). If we display > cpu and total memory only, then user will be confused when seeing a task > executor with enough remaining resources but tasks cannot be deployed onto > it (because the desired type of memory might be used up). > > 4. I've been using blink webui, which already have log pagination. It's > quite common that we need do search for some keywords (e.g., exception, > error, warning) from a large amount of logs for diagnosing problems. I find > it very inconvenient that I have to click into each page searching for the > keywords, and I'd rather take the effort to find the original log files > from the filesystem to view the log. Personally speaking, if the keyword > searching cannot be supported, I would prefer to take some time loading the > non-paginated logs over than paginated ones. Or we may at least have a > button on the webui for switching between the two alternatives. > > @Till > > Thanks for the inputs. > > Thank you~ > > Xintong Song > > > > On Mon, Sep 30, 2019 at 5:55 PM Till Rohrmann <trohrm...@apache.org> > wrote: > > > For 3. At the moment the log and stdout file serving requires the > > TaskExecutor to be running. But in some scenarios when having a NFS, it > > should be enough to know where the file is located. However, this > > assumption does not hold in the general case. > > > > Cheers, > > Till > > > > On Mon, Sep 30, 2019 at 11:43 AM Yadong Xie <vthink...@gmail.com> wrote: > > > > > Hi Xintong Song > > > > > > Thanks for your comments! > > > > > > 1. I think it is a good idea that to align CPU and memory usage with > > > FLIP-49 if it will release in version 1.10 > > > 2. We can update the task executor UI design after FLIP-56 merged into > > > master. Actually, the image > > > < > > > > > > https://cwiki.apache.org/confluence/download/attachments/125309297/BlinkResourceTM.png?version=1&modificationDate=1566223821000&api=v2 > > > > > > > in FLIP-56 is a good UI design, we can follow it in the Flink web. > > > 3. No idea about it, maybe anyone famailar with the runtime part could > > > answer it? but it would be great to add it to the web UI in my opinion. > > > 4. I'm not sure will keyword searching across all the pages may cost > too > > > many resources in job manager, but I think it would be very useful if > the > > > REST API could support it. > > > > > > Best, > > > Yadong > > > > > > Xintong Song <tonysong...@gmail.com> 于2019年9月29日周日 下午8:11写道: > > > > > > > Thanks for drafting the FLIP and starting this discussion, Yadong. > > > > > > > > > > > > I have some comments: > > > > > > > > > > > > - I can see that the proposed memory and cpu usage to be displayed > > (in > > > > section 1.1) are aligned with the current ResourceProfile fields. > > > > However, > > > > we are working on changing the memory fields in 1.10 with FLIP-49 > > > [1]. I > > > > suggest we align the UI design with the new FLIP-49 memory fields. > > > > - The task executor overview design (in section 1.2) is based on > the > > > > current slot model. The coming FLIP-56 [2] which is also planned > for > > > > 1.10 > > > > is changing the model so that task executors no longer have fixed > > > > number of > > > > slots, but allocated slots (may have different resources) and > > > available > > > > resources. > > > > - I can see that there's discussions in the google doc about > > using > > > > different color for available resources. However, the resource > > > > availability > > > > for different fields can be different, and may not be simply > > > > displayed by a > > > > different color. E.g., a task executor may have two slot, while > > > slot > > > > 1 > > > > takes (20% cpu, 10% heap mem, 50% managed mem, etc.), slot 2 > > takes > > > > (10% > > > > cpu, 35% heap mem, 0% managed mem etc.), and the remaining > > > > resources in > > > > the task executor are (70% cpu, 55% heap mem, 50% managed mem, > > > > etc.). How > > > > do you plan to display that? > > > > - I would suggest to have multiple bars for each task executor, > > > while > > > > each bar represents one of the resource fields. In addition, we > > > > may have a > > > > number (or some other figures) showing how many slots are > > allocated > > > > from > > > > the task executor. > > > > - Is there any way we provide access to logs of terminated task > > > > executors? It occurs to us a lot that a job failed due to a task > > > > executor > > > > fail/lost. And we have to find the logs of failed task executors > by > > > > manually accessing the file system. I think it would be helpful if > > we > > > > can > > > > find the logs of failed task executors directly in flink webui. > > > > - Regarding log pagination, is there any way to provide keyword > > > > searching across all the pages? > > > > > > > > > > > > Thank you~ > > > > > > > > Xintong Song > > > > > > > > > > > > [1] > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-49%3A+Unified+Memory+Configuration+for+TaskExecutors > > > > [2] > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-56%3A+Dynamic+Slot+Allocation > > > > > > > > On Fri, Sep 27, 2019 at 3:57 PM Paul Lam <paullin3...@gmail.com> > > wrote: > > > > > > > > > Filed a jira to track this[1]. Thanks a lot. > > > > > > > > > > [1] https://issues.apache.org/jira/browse/FLINK-14242 < > > > > > https://issues.apache.org/jira/browse/FLINK-14242> > > > > > > > > > > Best, > > > > > Paul Lam > > > > > > > > > > > 在 2019年9月27日,14:34,Yadong Xie <vthink...@gmail.com> 写道: > > > > > > > > > > > > Hi Paul > > > > > > Thanks for your suggestion. > > > > > > I think it is easy to implement, could you create a JIRA for me? > > > > > > > > > > > > Paul Lam <paullin3...@gmail.com> 于2019年9月27日周五 上午11:11写道: > > > > > > > > > > > >> Hi Yadong, > > > > > >> > > > > > >> Thanks a lot for summing up the Web UI efforts. > > > > > >> > > > > > >> I have a minor suggestion: can we provide a collapse button for > > the > > > > task > > > > > >> names in job graph visualization? For some complex jobs, > > especially > > > > SQL > > > > > >> jobs, the task names are quite long which makes the job graph > hard > > > to > > > > > read. > > > > > >> > > > > > >> Best, > > > > > >> Paul Lam > > > > > >> > > > > > >>> 在 2019年9月27日,10:13,Yadong Xie <vthink...@gmail.com> 写道: > > > > > >>> > > > > > >>> Hi all > > > > > >>> > > > > > >>> Flink Web UI is the main platform for most users to monitor > their > > > > jobs > > > > > >> and > > > > > >>> clusters. We have reconstructed Flink web in 1.9.0 version, but > > > there > > > > > are > > > > > >>> still some shortcomings. > > > > > >>> > > > > > >>> This discussion thread aims to provide a better experience for > > > Flink > > > > UI > > > > > >>> users. > > > > > >>> > > > > > >>> Here is the design doc I drafted: > > > > > >>> > > > > > >>> > > > > > >> > > > > > > > > > > > > > > > https://docs.google.com/document/d/1tIa8yN2prWWKJI_fa1u0t6h1r6RJpp56m48pXEyh6iI/edit?usp=sharing > > > > > >>> > > > > > >>> > > > > > >>> The FLIP can be found at [2]. > > > > > >>> > > > > > >>> Please keep the discussion here, in the mailing list. > > > > > >>> > > > > > >>> Looking forward to your opinions, any feedbacks are welcome. > > > > > >>> > > > > > >>> [1]: > > > > > >>> > > > > > >> > > > > > > > > > > > > > > > https://docs.google.com/document/d/1tIa8yN2prWWKJI_fa1u0t6h1r6RJpp56m48pXEyh6iI/edit?usp=sharing > > > > > >>> < > > > > > >> > > > > > > > > > > > > > > > https://docs.google.com/document/d/1tIa8yN2prWWKJI_fa1u0t6h1r6RJpp56m48pXEyh6iI/edit# > > > > > >>> > > > > > >>> [2]: > > > > > >>> > > > > > >> > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-75%3A+Flink+Web+UI+Improvement+Proposal > > > > > >> > > > > > >> > > > > > > > > > > > > > > > > > > > >