Hello Users,

I have a question that is perhaps not best solved within Flink: It has to
do with notifying a downstream application that a Flink window has
completed.

The (simplified) scenario is this:
- We have a Flink job that consumes from Kafka, does some preprocessing,
and then has a sliding window of 10 minutes and slide time of 1 minute.
- The number of keys in each slide is not fixed
- The output of the window is then output to Kafka, which is read by a
downstream application.

What I want to achieve is that the downstream application can someone know
when it has read all of the data for a single window, without waiting for
the next window to arrive.

Some options I've considered:
- Producing a second window over the window results that counts the output
size, which can then be used by the downstream application to see when it
has received the same number: This seems fragile, as there it relies on
there being no loss or duplication of data. Its also an extra window and
Kafka stream which is a tad messy.
- Somehow adding an 'end of window' element to each partitions of the Kafka
topic which can be read by the consumer: This seems a bit messy because it
mixes different types of events into the same Kafka stream, and there is no
really simple way to do this in Flink
- Package the whole window output into a single message and make this the
unit of transaction: This is possible, but the message would be quite large
then (at least 10s of mb), as the volume of this stream is quite large.
- Assume that if the consumer has no elements to read, or if the next
window has started to be read, then it has read the whole window: This
seems reasonable, and if it wasn't for the fact that my consumer on the
application end was a bit inflexible right now, it is probably the solution
I would use.

Any further/better ideas?

Thanks
Padarn

Reply via email to