Big +1 for this helpful feature :)
On 08/02/2019 13:54, Jark Wu wrote: Hi David, The demo looks charming! I think it will definitely help a lot when performance tuning. A big +1 for this. I cc-ed Yadong who's one of the main contributors of the new Web UI. Maybe he can give some help on the front end. Regards, Jark On Fri, 2 Aug 2019 at 04:26, David Morávek <david.mora...@gmail.com> wrote: > Hi Till, thanks for the feedback! These endpoints are only called when the > vertex is selected in the UI, so there should be any heavy RPC load. For > back-pressure, we only sample top 3 calls of the stack (depth = 3). For the > flame-graph, we want to sample the whole stack trace and we need different > sampling rate (longer period, more samples). Those are the main reasons to > split these in two "trackers", but I may be missing something. > > I've prepared a little demo, so others can have a better idea of what I > have in mind. > > https://youtu.be/GUNDehj9z9o > > Please note that this is a proof of concept and I'm not frontend person, so > it may look little clumsy :) > > D. > > On Thu, Aug 1, 2019 at 11:40 AM Till Rohrmann <trohrm...@apache.org> > wrote: > > > Hi David, > > > > thanks for starting this discussion. I like the idea of improving > insights > > into Flink's execution and I believe that a flame graph could be helpful. > > > > I quickly glanced over your changes and I think they go in a good > > direction. One idea could be to share the `StackTraceSample` produced by > > the `StackTraceSampleCoordinator` between the different > > `StackTraceOperatorTracker` so that we don't send multiple requests for > the > > same operators. That way we would decrease a bit the RPC load. > > > > Apart from that, I think the next steps would be to find a committer who > > could shepherd this effort and help you with merging it. > > > > Cheers, > > Till > > > > On Wed, Jul 31, 2019 at 7:05 PM David Morávek <d...@apache.org> wrote: > > > > > Hello, > > > > > > While looking into Flink internals, I've noticed that there is already > a > > > mechanism for stack-trace sampling of a particular job vertex. > > > > > > I think it may be really useful to allow user to easily render a cpu > > > flamegraph <http://www.brendangregg.com/flamegraphs.html> in a new UI > > for > > > a > > > selected vertex (new tab next to back pressure) of a running job. Back > > > pressure tab already provides a good idea of which vertex causes > trouble, > > > but it's hard to say what's actually going on. > > > > > > I've tried to implement a basic REST endpoint > > > < > > > > > > https://github.com/dmvk/flink/commit/716231822d2fe99004895cdd0a365560479445b9 > > > >, > > > that prepares data for the flame graph rendering and it seems to be > > > providing good insight. > > > > > > It should be straightforward to render data from the endpoint in new UI > > > using existing <https://github.com/spiermar/d3-flame-graph> javascript > > > libraries. > > > > > > WDYT? Is this worth pushing forward? > > > > > > D. > > > > > >