Hello,

I have two data streams, and want to join them using a tumbling window. Each
of the streams would have at most one record per window. There is also a
requirement to log/save the records that don't have a companion from the
other stream.
What would be the best option for my case? Would that be possible to use
Flink's Join?

I tried to use CoProcessFunction: truncating the timestamp of each record to
the beginning of the tumbling window, and then "keyBy" the two streams using
(key, truncated-timestamp). When I receive a record from one stream, if
that's the first record of the pair, then I save it to a MapState. If it is
the 2nd record, then I merge with the 1st one then fire.
This implementation works, but
(a) I feel that it's over-complicated, and
(b) I have a concern that when one stream is slower than the other, my
cached data would build up and make my cluster out-of-memory. Would
back-pressure kicks in for this case?

Thanks and best regards,
Averell



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Reply via email to