Hi guys,
I do have a question for how Flink generates windows.
We are using a 1-day sized sliding window with 1-hour slide to count some
features of items based on event time. We have about 20million items. We
observed that Flink only emit results on a fixed time in an hour (e.g. 1am,
2am, 3am, or 1:15am, 2:15am, 3:15am with a 15min offset). That's means
20million windows/records are generated at the same time every hour, which
burns down our sink. But nothing is generated in the rest of that hour. The
pattern is like this:
# generated windows
|
| /\ /\
| / \ / \
|_/__\_______/__\_
time
Is there any way to even out the number of generated windows/records in an
hour? Can we have evenly distributed generated load like this?
# generated windows
|
|
| ------------------------
|_______________
time
Thanks,
Bowen