Re: Poor use case? (late arrival + IoT + windowing)

Nicolaus Weidner Mon, 17 May 2021 05:23:19 -0700

Hi,

On Sat, May 15, 2021 at 5:07 PM <invalidrecipi...@pm.me> wrote:


> First I was told that my application need only perform keyed aggregation
> of streaming IoT data on a sliding window. Flink seemed the obvious choice.
>
> Then I was told that the window size must be configurable, taking on one
> of 5 possible values, anywhere from 5-60 minutes. Oh and configuration
> changes should take effect immediately. No biggie - I just opted to perform
> aggregation against all 5 possible window durations and let the
> post-processor worry about which outputs were of interest.
>
> Now the latest requirement (and the interesting part): if the IoT devices
> lose connectivity, they will buffer many days worth of data until
> connectivity is restored at which point all of that buffered data will be
> transmitted to my application. I believe this implies that event time (as
> determined by each individual device) must now be taken into consideration
> but...
>
>
> Question 1: is Flink really the right choice for this application now?
> Assuming the memory requirements for allowing such late data wouldn't be a
> deal-breaker, is Flink even capable of tracking event time on a per
> device/key basis?
>

Tracking and using event time for a windowed aggregation is certainly
possible. You can check out [1] for an introduction and [2] for some
further information on assigning timestamps to events. Of course, the
events need to contain some timestamp of when they were produced in the
first place, which I assume to be the case.

Next, lateness: You will need to define a "watermark strategy", i.e. a
strategy for deciding when it is safe to close a certain window (see [2]).
For example, you could decide that events will probably arrive at most 30
minutes out of order, so 30 minutes after seeing an event with timestamp x,
windows with that ending timestamp can be closed. Note that key-based
watermarks are not supported currently, they are global.
In addition, you can configure "allowed lateness" for windows [3], meaning
that windows will fire again with updated results if events arrive after
the "end watermark" of the window has passed and it fired once.


> Question 2: Assuming a solution with Flink is suitable, what constructs
> would I need to leverage? Custom windows maybe? Custom triggers and
> evictors?
>

For your use case, you would probably need to allow lateness of one or even
several weeks. This is not necessarily a problem, but it will depend on the
type of aggregation you perform - whether all events need to be kept in
state or just some aggregate values. There are some tips on state size on
the bottom of the page in [3]. If you use RocksDB as state backend, state
will be kept on disk, so memory limitations shouldn't be an issue.
A custom trigger is not strictly required, but helpful: The default
EventTimeTrigger would fire for each element of a late batch, whereas you
probably want to fire only after no further event was received for some
time span.

An alternative would be to route late events to a side output (see also
[3]) and process them separately. This may be preferrable if late batches
are more of a rare case, as it won't interfere with the main streaming
logic.

Hope this helped at least a bit!
Best wishes,
Nico

[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/concepts/time/#event-time-and-watermarks
[2]
https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/dev/datastream/event-time/generating_watermarks/
[3]
https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/dev/datastream/operators/windows/#allowed-lateness
[4]
https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/ops/state/state_backends/

Re: Poor use case? (late arrival + IoT + windowing)

Reply via email to