pro tip for debugging watermarks: They are exposed via a metric in Flink
1.2.

On Tue, Mar 7, 2017 at 1:37 PM, Bruno Aranda <brunoara...@gmail.com> wrote:

> Hi Gordon,
>
> Many thanks for your helpful ideas. We tried yesterday the CEP approach,
> but could not figure it out. The ProcessFunction one looks more promising,
> and we are investigating it, though we are fighting with some issues
> related to the event time, where we cannot see so far the timer triggered
> at the right event time. We are using ascending timestamps, but at the
> moment we see the timers fired when it is too late. Investigating more.
>
> Thanks,
>
> Bruno
>
> On Tue, 7 Mar 2017 at 07:49 Tzu-Li (Gordon) Tai <tzuli...@apache.org>
> wrote:
>
>> Some more input:
>>
>> Right now, you can also use the `ProcessFunction` [1] available in Flink
>> 1.2 to simulate state TTL.
>> The `ProcessFunction` should allow you to keep device state and simulate
>> the online / offline detection by registering processing timers. In the
>> `onTimer` callback, you can emit the “offline” marker event downstream, and
>> in the `processElement` method, you can emit the “online” marker event if
>> the case is the device has sent an event after it was determined to be
>> offline.
>>
>> [1] https://ci.apache.org/projects/flink/flink-docs-
>> release-1.2/dev/stream/process_function.html
>>
>>
>> On March 6, 2017 at 9:40:28 PM, Tzu-Li (Gordon) Tai (tzuli...@apache.org)
>> wrote:
>>
>> Hi Bruno!
>>
>> The Flink CEP library also seems like an option you can look into to see
>> if it can easily realize what you have in mind.
>>
>> Basically, the pattern you are detecting is a timeout of 5 minutes after
>> the last event. Once that pattern is detected, you emit a “device offline”
>> event downstream.
>> With this, you can also extend the pattern output stream to detect
>> whether a device has became online again.
>>
>> Here are some materials for you to take a look at Flink CEP:
>> 1. http://flink.apache.org/news/2016/04/06/cep-monitoring.html
>> 2. https://www.slideshare.net/FlinkForward/fabian-huesketill-rohrmann-
>> declarative-stream-processing-with-streamsql-and-cep?qid=
>> 3c13eb7d-ed39-4eae-9b74-a6c94e8b08a3&v=&b=&from_search=4
>>
>> The CEP parts in the slides in 2. also provides some good examples of
>> timeout detection using CEP.
>>
>> Hope this helps!
>>
>> Cheers,
>> Gordon
>>
>> On March 4, 2017 at 1:27:51 AM, Bruno Aranda (bara...@apache.org) wrote:
>>
>> Hi all,
>>
>> We are trying to write an online/offline detector for devices that keep
>> streaming data through Flink. We know how often roughly to expect events
>> from those devices and we want to be able to detect when any of them stops
>> (goes offline) or starts again (comes back online) sending events through
>> the pipeline. For instance, if 5 minutes have passed since the last event
>> of a device, we would fire an event to indicate that the device is offline.
>>
>> The data from the devices comes through Kafka, with their own event time.
>> The devices events are in order in the partitions and each devices goes to
>> a specific partition, so in theory, we should not have out of order when
>> looking at one partition.
>>
>> We are assuming a good way to do this is by using sliding windows that
>> are big enough, so we can see the relevant gap before/after the events for
>> each specific device.
>>
>> We were wondering if there are other ideas on how to solve this.
>>
>> Many thanks!
>>
>> Bruno
>>
>>

Reply via email to