Hello Flink community. I need help. Thus far, Flink has proven very useful
to me.

I am using it for stream processing of time-series data.

For the scope of this mailing list, let's say the time-series has the
fields: name: String, value: double, and timestamp: Instant.

I named the time series: timeSeriesDataStream.

My first task was to average the time series by name within a 15 minute
tumbling event time window.

\
I was able to solve this with a ProcessWindowFunction (had to use this
approach because the watermark is not keyed), and named resultant  stream:
aggregateTimeSeriesDataStream, and then "sinking" the values.

My next task is to backfill the name averages on the subsequent. This means
that if a time-series does not appear in a subsequent window then the
previous average value will be used in that window.

How do I do this?

I started by performing a Map function on the aggregateTimeSeriesDataStream
to change the timestamp back 15 minutes, and naming the resultant stream:
backfilledDataStream.

Now, I am stuck. I suspect that I either

1) timeSeriesDataStream.coGroup(backfilledDataStream) and add
CoGroupWindowFunction to process the backfill.
2) Use "iterate" to somehow jury rig a backfill.

I really don't know.  That's why I am asking this group for advice.

What's the common solution for this problem? I am quite sure that this is a
very common use-case.

Reply via email to