Zhanghao Chen created FLINK-32127:
-------------------------------------
Summary: Source busy time is inaccurate in many cases
Key: FLINK-32127
URL: https://issues.apache.org/jira/browse/FLINK-32127
Project: Flink
Issue Type: Improvement
Components: Autoscaler
Reporter: Zhanghao Chen
We found that source busy time is inaccurate in many cases. The reason is that
sources are usu. multi-threaded (Kafka and RocketMq for example), there is a
fetcher thread fetching data from data source, and a consumer thread
deserializes data with an blocking queue in between. A source is consideredÂ
# *idle* if the consumer is blocked by fetching data from the queue
# *backpressured* if the consumer is blocked by writing data to downstream
operators
# *busy* otherwise
However, this means that if the bottleneck is on the fetcher side, the consumer
will be often blocked by fetching data from the queue, the source idle time
would be high, but in fact it is busy and consumes a lot of CPU. In some of our
jobs, the source max busy time is only ~600 ms while it is actually reaching
the limit.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)