I discussed about this quite a bit with other people.

It is not totally straightforward. One could try and measure exhaustion of
the output buffer pools, but that fluctuates a lot - it would need some
work to get a stable metric from that...

If you have a profiler that you can attach to the processes, you could
check whether a lot of time is spent within the "requestBufferBlocking()"
method of the buffer pool...

Stephan


On Mon, Dec 7, 2015 at 9:45 AM, Gyula Fóra <gyf...@apache.org> wrote:

> Hey guys,
>
> Is there any way to monitor the backpressure in the Flink job? I find it
> hard to debug slow operators because of the backpressure mechanism so it
> would be good to get some info out of the network layer on what exactly
> caused the backpressure.
>
> For example:
>
> task1 -> task2 -> task3 -> task4
>
> I want to figure out whether task 2 or task 3 is slow.
>
> Any ideas?
>
> Thanks,
> Gyula
>

Reply via email to