Hello, There is one Flink job which consumes from Kafka topic (TOPIC_IN), simply flatMap / map data and push to another Kafka topic (TOPIC_OUT). TOPIC_IN has around 30 partitions, data is more or less sequential per partition and the job has parallelism 30. So in theory there should be 1:1 mapping between consumer and partition.
But it's often to see big lag in offsets for some partitions. So that should mean that some of consumers are slower than another (i.e. some network issues for particular broker host or anything else). So data in TOPIC_OUT partitions is distributed but not sequential at all. So when some another flink job consumes from TOPIC_OUT and uses BoundedOutOfOrdernessTimestampExtractor to generate watermarks, due to difference in data timestamps, there can be a lot of late data. Maybe something is missing of course in this setup or there is more good approach for such flatMap / map jobs. Setting big WindowedStream#allowedLateness or giving more time for BoundedOutOfOrdernessTimestampExtractor will increase memory consumption and probably will cause another issues and anyway there can be late data which is not good for later windows. One of the solution is to have some shared place, to synchronize lower timestamp between consumers and somehow slow down consumption (Thread sleep, wait, while loop with condition...). 0. Is there any good approach to handle such "Kafka <- flatMap / map -> Kafka" tasks? so data in TOPIC_OUT will be sequential as in TOPIC_IN. 1. As far as I see it should be common problem with some slow consumers for big Kafka topic with a lot of partitions, isn't it? How Flink/Kafka hadle it? 2. Does somebody know, is there any mechanism in Flink - Kafka, (backpreassure?), which can tell from child operator (some process function for example) to specific fast consumers to slow down a bit? Is something like callback possible in Flink, don't think so, but..? 3. Or is there in Flink already anything which can help to synchronize minimum timestamps between consumers and? 4. Is there any good approach to slow down consumption in Kafka consumer? There should be some problems between session timeout and poll I think or something related to that, but maybe there is already some good solution for that :) Will be glad if somebody can give some hints for any of the questions, Best, Sasha