[ https://issues.apache.org/jira/browse/FLINK-34400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17818280#comment-17818280 ]
Rui Fan commented on FLINK-34400: --------------------------------- Hi [~asardaes] , did you try enable idleness? I try watermark alignment on my Mac with kafka docker, it works well. * Don't enable idleness: The quick topic will be blocked when slow topic doesn't have data. (Because quick topic source is waiting for the slow source) * Set idleness = 5s for slow topic: The quick topic is only blocked for 5 seconds. After 5 seconds, it start consume. Here is my test code, you can change a little code to simulate your case. * Data generator to Kafka topic: ** I generate some data to quick topic and slow topic. ** After a while, I restart it and don't write any data to slow topic. (Only write data to quick topic to simulate your case.) ** https://github.com/1996fanrui/fanrui-learning/blob/1069662b5fb434928c4141bf2397149cd354489b/module-flink/src/main/java/com/dream/flink/kafka/alignment/KafkaDataGenerator.java * Consume data from kafka: ** https://github.com/1996fanrui/fanrui-learning/blob/1069662b5fb434928c4141bf2397149cd354489b/module-flink/src/main/java/com/dream/flink/kafka/alignment/KafkaAlignmentDemo.java > Kafka sources with watermark alignment sporadically stop consuming > ------------------------------------------------------------------ > > Key: FLINK-34400 > URL: https://issues.apache.org/jira/browse/FLINK-34400 > Project: Flink > Issue Type: Bug > Affects Versions: 1.18.1 > Reporter: Alexis Sarda-Espinosa > Priority: Major > Attachments: alignment_lags.png, logs.txt > > > I have 2 Kafka sources that read from different topics. I have assigned them > to the same watermark alignment group, and I have _not_ enabled idleness > explicitly in their watermark strategies. One topic remains pretty much empty > most of the time, while the other receives a few events per second all the > time. Parallelism of the active source is 2, for the other one it's 1, and > checkpoints are once every minute. > This works correctly for some time (10 - 15 minutes in my case) but then 1 of > the active sources stops consuming, which causes lag to increase. Weirdly, > after another 15 minutes or so, all the backlog is consumed at once, and then > everything stops again. > I'm attaching some logs from the Task Manager where the issue appears. You > will notice that the Kafka network client reports disconnections (a long time > after the deserializer stopped reporting that events were being consumed), > I'm not sure if this is related. -- This message was sent by Atlassian Jira (v8.20.10#820010)