Hi Paul, for now I only plan to add the one based on java stack traces.

On Fri, Aug 2, 2019 at 9:34 AM Paul Lam <paullin3...@gmail.com> wrote:

> Hi David,
>
> Thanks for the new feature! I think the flame graph would be a useful tool
> to understand the state of job executions, and it looks good too. +1 for
> this.
>
> And a minor question: do we plan to support multiple kinds of flame
> graphs? It would be great if we have both on-cpu and off-cpu flame graphs.
>
> Best,
> Paul Lam
>
> > 在 2019年8月2日,04:24,David Morávek <david.mora...@gmail.com> 写道:
> >
> > Hi Till, thanks for the feedback! These endpoints are only called when
> the
> > vertex is selected in the UI, so there should be any heavy RPC load. For
> > back-pressure, we only sample top 3 calls of the stack (depth = 3). For
> the
> > flame-graph, we want to sample the whole stack trace and we need
> different
> > sampling rate (longer period, more samples). Those are the main reasons
> to
> > split these in two "trackers", but I may be missing something.
> >
> > I've prepared a little demo, so others can have a better idea of what I
> > have in mind.
> >
> > https://youtu.be/GUNDehj9z9o
> >
> > Please note that this is a proof of concept and I'm not frontend person,
> so
> > it may look little clumsy :)
> >
> > D.
> >
> > On Thu, Aug 1, 2019 at 11:40 AM Till Rohrmann <trohrm...@apache.org>
> wrote:
> >
> >> Hi David,
> >>
> >> thanks for starting this discussion. I like the idea of improving
> insights
> >> into Flink's execution and I believe that a flame graph could be
> helpful.
> >>
> >> I quickly glanced over your changes and I think they go in a good
> >> direction. One idea could be to share the `StackTraceSample` produced by
> >> the `StackTraceSampleCoordinator` between the different
> >> `StackTraceOperatorTracker` so that we don't send multiple requests for
> the
> >> same operators. That way we would decrease a bit the RPC load.
> >>
> >> Apart from that, I think the next steps would be to find a committer who
> >> could shepherd this effort and help you with merging it.
> >>
> >> Cheers,
> >> Till
> >>
> >> On Wed, Jul 31, 2019 at 7:05 PM David Morávek <d...@apache.org> wrote:
> >>
> >>> Hello,
> >>>
> >>> While looking into Flink internals, I've noticed that there is already
> a
> >>> mechanism for stack-trace sampling of a particular job vertex.
> >>>
> >>> I think it may be really useful to allow user to easily render a cpu
> >>> flamegraph <http://www.brendangregg.com/flamegraphs.html> in a new UI
> >> for
> >>> a
> >>> selected vertex (new tab next to back pressure) of a running job. Back
> >>> pressure tab already provides a good idea of which vertex causes
> trouble,
> >>> but it's hard to say what's actually going on.
> >>>
> >>> I've tried to implement a basic REST endpoint
> >>> <
> >>>
> >>
> https://github.com/dmvk/flink/commit/716231822d2fe99004895cdd0a365560479445b9
> >>>> ,
> >>> that prepares data for the flame graph rendering and it seems to be
> >>> providing good insight.
> >>>
> >>> It should be straightforward to render data from the endpoint in new UI
> >>> using existing <https://github.com/spiermar/d3-flame-graph> javascript
> >>> libraries.
> >>>
> >>> WDYT? Is this worth pushing forward?
> >>>
> >>> D.
> >>>
> >>
>
>

Reply via email to