Piotr Nowojski created FLINK-25414: -------------------------------------- Summary: Provide metrics to measure how long task has been blocked Key: FLINK-25414 URL: https://issues.apache.org/jira/browse/FLINK-25414 Project: Flink Issue Type: New Feature Components: Runtime / Metrics, Runtime / Task Affects Versions: 1.14.2 Reporter: Piotr Nowojski
Currently back pressured/busy metrics tell the user whether task is blocked/busy and how much % of the time it is blocked/busy. But they do not tell how for how long single block event is lasting. It can be 1ms or 1h and back pressure/busy would be still reporting 100%. In order to improve this, we could provide two new metrics: # maxSoftBackPressureDuration # maxHardBackPressureDuration The max would be reset to 0 periodically or on every access to the metric (via metric reporter). Soft back pressure would be if task is back pressured in a non blocking fashion (StreamTask detected in availability of the output). Hard back pressure would measure the time task is actually blocked. Unfortunately I don't know how to efficiently provide similar metric for busy time, without impacting max throughput. -- This message was sent by Atlassian Jira (v8.20.1#820001)