Thanks for the tip! I am currently trying to implement a zookeeper-based coordinator.use it to record the current watermark and control streaming according to your first suggest.
Piotr Nowojski <pnowoj...@apache.org> 于2020年9月16日周三 下午11:56写道: > Hey, > > If you are worried about increased amount of buffered data by the > WindowOperator if watermarks/event time is not progressing uniformly across > multiple sources, then there is little you can do currently. FLIP-27 [1] > will allow us to address this problem in more generic way. What you can > currently do is one of two things: > > 1. Implement a custom throttling function/operator sitting after the > sources, that would throttle the sources. If you chain it with the source > function, it's relatively ok solution. Note, while you are blocking > execution, you will be blocking for example checkpoints from happening. So > it's better to sleep 10 ms per every record, compared to sleep 10 seconds > once every 1000 records. > 2. Throttle the sources themselves (you would need to modify or write your > custom sources). > > But in both cases you need to manually track the event time, and manually > make decision which source should be throttled and by how much. > > Best regards, Piotrek > > [1] > https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface#FLIP27:RefactorSourceInterface-EventTimeAlignment > > śr., 16 wrz 2020 o 04:17 hao kong <h...@lemonbox.me> napisał(a): > >> Hello guys, >> >> I have a job with multiple Kafka sources. They all contain certain >> historical data. If you use the events-time window, it will cause sources >> with less data to cover more sources through water mark. >> >> >> I can think of a solution, Implement a scheduler in the source phase, But >> it is quite complicated to implement. Are ther otherbetter solutions? >> >> >> Any suggestions? >> Thanks! >> >> >>