Re: flink_taskmanager_job_task_operator_records_lag_max == -Inf on Flink 1.4.2

Tzu-Li (Gordon) Tai Wed, 13 Jun 2018 01:41:29 -0700

Hi,

Which Kafka version are you using?


AFAIK, the only recent changes to Kafka connector metrics in the 1.4.x series 
would be FLINK-8419 [1].
The ‘records_lag_max’ metric is a Kafka-shipped metric simply forwarded from 
the internally used Kafka client, so nothing should have been affected.

Do you see other metrics under the pattern of 
‘flink_taskmanager_job_task_operator_*’? All Kafka-shipped metrics should still 
follow this pattern.
If not, could you find the ‘records_lag_max’ metric (or any other Kafka-shipped 
metrics [2]) under the user scope ‘KafkaConsumer’?

The above should provide more insight into what may be wrong here.

- Gordon

[1] https://issues.apache.org/jira/browse/FLINK-8419
[2] https://docs.confluent.io/current/kafka/monitoring.html#fetch-metrics

On 12 June 2018 at 11:47:51 PM, Julio Biason (julio.bia...@azion.com) wrote:

Hey guys,

I just updated our Flink install from 1.4.0 to 1.4.2, but our Prometheus 
monitoring is not getting the current Kafka lag.

After updating to 1.4.2 and making the symlink between 
opt/flink-metrics-prometheus-1.4.2.jar to lib/, I got the metrics back on 
Prometheus, but the most important one, 
flink_taskmanager_job_task_operator_records_lag_max is now returning -Inf.

Did I miss something?

--
Julio Biason, Sofware Engineer
AZION  |  Deliver. Accelerate. Protect.
Office: +55 51 3083 8101  |  Mobile: +55 51 99907 0554

Re: flink_taskmanager_job_task_operator_records_lag_max == -Inf on Flink 1.4.2

Reply via email to