It depends on the user case, in Shilpa's use case it is about users so
the user ids are probably know beforehand.
https://dpaste.org/cRe3G <= This is an example with out an window but
essentially Shilpa you would be reregistering the timers every time they
fire.
You would also have to ingest the user ids before hand into your
pipeline, so that if a user never has any event he still gets a
notification. So probably on startup ingest the user ids with a single
source
from the DB.
My example is pretty minimal but the idea in your case stays the same:
- key by user
- have a co-process function to init the state with the user ids
- reregister the timers every time they fire
- use `env.getConfig().setAutoWatermarkInterval(1000)` to move the event
time forward even if there is no data coming in (this is what you are
probably looking for!!)
- then collect an Optionable/CustomStruct/Null or so depending on if
data is present or not
- and then u can check whether the event was triggered because there was
data or because there wasn't data
Best regards,
Dario
On 09.05.22 15:19, Andrew Otto wrote:
This sounds similar to a non streaming problem we had at WMF. We
ingest all event data from Kafka into HDFS/Hive and partition the Hive
tables in hourly directories. If there are no events in a Kafka topic
for a given hour, we have no way of knowing if the hour has been
ingested successfully. For all we know, the upstream producer
pipeline might be broken.
We solved this by emitting artificial 'canary' events into each topic
multiple times an hour. The canary events producer uses the same code
pathways and services that (most) of our normal event producers do.
Then, when ingesting into Hive, we filter out the canary events. The
ingestion code has work to do and can mark an hour as complete, but
still end up writing no events to it.
Perhaps you could do the same? Always emit artificial events, and
filter them out in your windowing code? The window should still fire
since it will always have events, even if you don't use them?
On Mon, May 9, 2022 at 8:55 AM Shilpa Shankar <sshan...@bandwidth.com>
wrote:
Hello,
We are building a flink use case where we are consuming from a
kafka topic and performing aggregations and generating alerts
based on average, max, min thresholds. We also need to notify the
users when there are 0 events in a Tumbling Event Time Windows. We
are having trouble coming up with a solution to do the same. The
options we considered are below, please let us know if there are
other ideas we haven't looked into.
[1] Querable State : Save the keys in each of the Process Window
Functions. Query the state from an external application and alert
when a key is missing after the 20min time interval has expired.
We see Queryable state feature is being deprecated in the future.
We do not want to go down this path when we already know there is
an EOL for it.
[2] Use Processing Time Windows : Using Processing time instead
of Event time would have been an option if our downstream
applications would send out events in real time. Maintenances of
the downstream applications, delays etc would result in a lot of
data loss which is undesirable.
Flink version : 1.14.3
Thanks,
Shilpa