pro tip for debugging watermarks: They are exposed via a metric in Flink 1.2.
On Tue, Mar 7, 2017 at 1:37 PM, Bruno Aranda <brunoara...@gmail.com> wrote: > Hi Gordon, > > Many thanks for your helpful ideas. We tried yesterday the CEP approach, > but could not figure it out. The ProcessFunction one looks more promising, > and we are investigating it, though we are fighting with some issues > related to the event time, where we cannot see so far the timer triggered > at the right event time. We are using ascending timestamps, but at the > moment we see the timers fired when it is too late. Investigating more. > > Thanks, > > Bruno > > On Tue, 7 Mar 2017 at 07:49 Tzu-Li (Gordon) Tai <tzuli...@apache.org> > wrote: > >> Some more input: >> >> Right now, you can also use the `ProcessFunction` [1] available in Flink >> 1.2 to simulate state TTL. >> The `ProcessFunction` should allow you to keep device state and simulate >> the online / offline detection by registering processing timers. In the >> `onTimer` callback, you can emit the “offline” marker event downstream, and >> in the `processElement` method, you can emit the “online” marker event if >> the case is the device has sent an event after it was determined to be >> offline. >> >> [1] https://ci.apache.org/projects/flink/flink-docs- >> release-1.2/dev/stream/process_function.html >> >> >> On March 6, 2017 at 9:40:28 PM, Tzu-Li (Gordon) Tai (tzuli...@apache.org) >> wrote: >> >> Hi Bruno! >> >> The Flink CEP library also seems like an option you can look into to see >> if it can easily realize what you have in mind. >> >> Basically, the pattern you are detecting is a timeout of 5 minutes after >> the last event. Once that pattern is detected, you emit a “device offline” >> event downstream. >> With this, you can also extend the pattern output stream to detect >> whether a device has became online again. >> >> Here are some materials for you to take a look at Flink CEP: >> 1. http://flink.apache.org/news/2016/04/06/cep-monitoring.html >> 2. https://www.slideshare.net/FlinkForward/fabian-huesketill-rohrmann- >> declarative-stream-processing-with-streamsql-and-cep?qid= >> 3c13eb7d-ed39-4eae-9b74-a6c94e8b08a3&v=&b=&from_search=4 >> >> The CEP parts in the slides in 2. also provides some good examples of >> timeout detection using CEP. >> >> Hope this helps! >> >> Cheers, >> Gordon >> >> On March 4, 2017 at 1:27:51 AM, Bruno Aranda (bara...@apache.org) wrote: >> >> Hi all, >> >> We are trying to write an online/offline detector for devices that keep >> streaming data through Flink. We know how often roughly to expect events >> from those devices and we want to be able to detect when any of them stops >> (goes offline) or starts again (comes back online) sending events through >> the pipeline. For instance, if 5 minutes have passed since the last event >> of a device, we would fire an event to indicate that the device is offline. >> >> The data from the devices comes through Kafka, with their own event time. >> The devices events are in order in the partitions and each devices goes to >> a specific partition, so in theory, we should not have out of order when >> looking at one partition. >> >> We are assuming a good way to do this is by using sliding windows that >> are big enough, so we can see the relevant gap before/after the events for >> each specific device. >> >> We were wondering if there are other ideas on how to solve this. >> >> Many thanks! >> >> Bruno >> >>