Hi, Which Kafka version are you using?
AFAIK, the only recent changes to Kafka connector metrics in the 1.4.x series would be FLINK-8419 [1]. The ‘records_lag_max’ metric is a Kafka-shipped metric simply forwarded from the internally used Kafka client, so nothing should have been affected. Do you see other metrics under the pattern of ‘flink_taskmanager_job_task_operator_*’? All Kafka-shipped metrics should still follow this pattern. If not, could you find the ‘records_lag_max’ metric (or any other Kafka-shipped metrics [2]) under the user scope ‘KafkaConsumer’? The above should provide more insight into what may be wrong here. - Gordon [1] https://issues.apache.org/jira/browse/FLINK-8419 [2] https://docs.confluent.io/current/kafka/monitoring.html#fetch-metrics On 12 June 2018 at 11:47:51 PM, Julio Biason (julio.bia...@azion.com) wrote: Hey guys, I just updated our Flink install from 1.4.0 to 1.4.2, but our Prometheus monitoring is not getting the current Kafka lag. After updating to 1.4.2 and making the symlink between opt/flink-metrics-prometheus-1.4.2.jar to lib/, I got the metrics back on Prometheus, but the most important one, flink_taskmanager_job_task_operator_records_lag_max is now returning -Inf. Did I miss something? -- Julio Biason, Sofware Engineer AZION | Deliver. Accelerate. Protect. Office: +55 51 3083 8101 | Mobile: +55 51 99907 0554