Hi,
while I cannot provide you with a definite answer to your question,
maybe my Stack Overflow question is interesting for you:
https://stackoverflow.com/questions/66067263/how-to-join-a-frequently-updating-stream-with-an-irregularly-updating-stream-i
Best regards,
Sören
Am 03.01.23 um 09:15 schrieb Ifat Afek (Nokia):
Hi,
We are trying to implement the following use case:
We have a stream of DataX events that arrive every 5 minutes and
require some processing. Each event holds data for a specific
non-unique ID (we keep getting updated data for each ID). There might
be up to 1,000,000 IDs.
In addition, there is a stream of DataY events for some of these IDs,
that arrive in a variable frequency. Could be after a minute and then
again after 5 hours.
We would like to join the current DataX and latest DataY events by ID
(and process only IDs that have both DataX and DataY events).
We thought of holding a state of DataY events per ID in a global
window, and then use it as a side input for filtering the DataX events
stream. The state should hold the latest (by timestamp) DataY event
that arrived.
The problem is: if we are using discardingFiredPanes(), then each
DataY event is fired only once and cannot be reused later on for
filtering. On the other hand, if we are using
accumulatingFiredPanes(), then a list of all DataY events that arrived
is fired.
Are we missing something? what is the best practice for combining two
streams, one with a variable frequency?
Thanks,
Ifat