I discussed about this quite a bit with other people. It is not totally straightforward. One could try and measure exhaustion of the output buffer pools, but that fluctuates a lot - it would need some work to get a stable metric from that...
If you have a profiler that you can attach to the processes, you could check whether a lot of time is spent within the "requestBufferBlocking()" method of the buffer pool... Stephan On Mon, Dec 7, 2015 at 9:45 AM, Gyula Fóra <gyf...@apache.org> wrote: > Hey guys, > > Is there any way to monitor the backpressure in the Flink job? I find it > hard to debug slow operators because of the backpressure mechanism so it > would be good to get some info out of the network layer on what exactly > caused the backpressure. > > For example: > > task1 -> task2 -> task3 -> task4 > > I want to figure out whether task 2 or task 3 is slow. > > Any ideas? > > Thanks, > Gyula >