Hi, (flink version 1.14.2, kafka version 2.6.1)
I have a flink job consuming kafka and simply sinking the data into s3. The kafka consumer is sometimes delayed on a few partitions. The partitions are evenly registered by flink subtasks. I found there was a correlation between kafka consumer fetch latency and the consumer lag. The metric is called *flink_taskmanager_job_task_operator_KafkaSourceReader_KafkaConsumer_fetch_latency_max* . When it reaches 2~3 minutes high, the consumer lag will increase dramatically. After the fetch latency gets back to normal, the consumer can catch up with the lag very soon. I believe there is no back pressure on the flink pipeline, as both the outpool of kafka consumer, and the inpool of s3 sink are always below 0.6. Not sure if this is a flink related or kafka related issue. Any advice or similar experience is welcomed. Thanks a lot. Best, Kevin