[ https://issues.apache.org/jira/browse/FLINK-3310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16759781#comment-16759781 ]
Barisa commented on FLINK-3310: ------------------------------- Hi, is the backpressure operation something that is expenesive? I'm asking, since we are considering in polling this info once a minute, and exposing as an Prometheus metric. Question already asked in [http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Continuous-Monitoring-of-back-pressure-tt25869.html] I'm currently writing some code to convert the back-pressure REST API data into Prometheus-compatible output. I was just curious why back-pressure wasn't already exposed as a metric in the in-built Prometheus exporter? Is it because the thread-sampling is too intensive? Or too slow (particularly if running multiple jobs)? In our case we're running a single job per cluster. Any feedback would be appreciated. Regards, Dave > Add back pressure statistics to web frontend > -------------------------------------------- > > Key: FLINK-3310 > URL: https://issues.apache.org/jira/browse/FLINK-3310 > Project: Flink > Issue Type: Improvement > Components: Webfrontend > Reporter: Ufuk Celebi > Assignee: Ufuk Celebi > Priority: Minor > Fix For: 1.0.0 > > > When a task is receiving data at a higher rate than it can process, the task > is back pressuring preceding tasks. Currently, there is no way to tell > whether this is the case or not. An indicator for back pressure is tasks > being stuck in buffer requests on the network stack. This means that they > have filled all their buffers with data, but the following tasks/network are > not consuming them fast enough. > A simple way to measure back pressure is to sample running tasks and report > back pressure if they are stuck in the blocking buffers calls. -- This message was sent by Atlassian JIRA (v7.6.3#76005)