Flink Kafka Issues

Ramya Ramamurthy Thu, 18 Jul 2019 00:36:48 -0700

Hi,

We are facing a serious production issue with Flink. Any help would be
appreciated.


We receive packets from a Kafka Cluster - This cluster has a sudden drop in
the packets from 22:00 UTC till 00:30 UTC everyday [on a specific topic,
say "topic A"]. Though our job reads from a different topic [say "topic
B"], we see that we drop a lot of packets here [due to "laterecordsDropped"
metric]. At the same time, we see the job which reads from "topic A" has
high fetch rate. We also observed one of the brokers of this cluster had an
abnormal CPU rise [which i attributed to the high fetch rates].

We have a tumbling window of 1 min [with 10 seconds of
watermarksPeriodicBounded].  This is based on the packets' event time. Is
there any reason why my job reading from "topic B" can higher records
dropped.

The picture below has a screenshot where
Laterecords dropped corresponds to job reading from "topic B"
Fetch and Consume rates relates to job reading from "topic A" [which has
the downward trend in traffic in the mentioned times].

[image: image.png]

All these graphs are correlated and we are unable to isolate this problem.
there are other modules which consumes from this topic, and we have no slow
records logged here, which is why we are not sure of there is this issue
with Flink alone.

Thanks.

Flink Kafka Issues

Reply via email to