Thanks for the tip!
     I am currently trying to implement a zookeeper-based coordinator.use
it to record the current watermark and control streaming according to your
first suggest.

Piotr Nowojski <pnowoj...@apache.org> 于2020年9月16日周三 下午11:56写道:

> Hey,
>
> If you are worried about increased amount of buffered data by the
> WindowOperator if watermarks/event time is not progressing uniformly across
> multiple sources, then there is little you can do currently. FLIP-27 [1]
> will allow us to address this problem in more generic way. What you can
> currently do is one of two things:
>
> 1. Implement a custom throttling function/operator sitting after the
> sources, that would throttle the sources. If you chain it with the source
> function, it's relatively ok solution. Note, while you are blocking
> execution, you will be blocking for example checkpoints from happening. So
> it's better to sleep 10 ms per every record, compared to sleep 10 seconds
> once every 1000 records.
> 2. Throttle the sources themselves (you would need to modify or write your
> custom sources).
>
> But in both cases you need to manually track the event time, and manually
> make decision which source should be throttled and by how much.
>
> Best regards, Piotrek
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface#FLIP27:RefactorSourceInterface-EventTimeAlignment
>
> śr., 16 wrz 2020 o 04:17 hao kong <h...@lemonbox.me> napisał(a):
>
>> Hello guys,
>>
>> I have a job with multiple Kafka sources. They all contain certain
>> historical data. If you use the events-time window, it will cause sources
>> with less data to cover more sources through water mark.
>>
>>
>> I can think of a solution, Implement a scheduler in the source phase, But
>> it is quite complicated to implement. Are ther otherbetter solutions?
>>
>>
>> Any suggestions?
>> Thanks!
>>
>>
>>

Reply via email to