Ok, thanks Aljoscha. As an alternative to using Flink to maintain the schedule state, I could take the (e, t2) stream and write to a external key-value store with a bucket for each minute. Then have a separate service which polls the key-value store every minute and retrieves the current bucket, and does the final transformation.
I just thought there might be a nicer way to do it using Flink! On Thu, Jun 9, 2016 at 2:23 PM, Aljoscha Krettek <aljos...@apache.org> wrote: > Hi Josh, > I'll have to think a bit about that one. Once I have something I'll get > back to you. > > Best, > Aljoscha > > On Wed, 8 Jun 2016 at 21:47 Josh <jof...@gmail.com> wrote: > >> This is just a question about a potential use case for Flink: >> >> I have a Flink job which receives tuples with an event id and a timestamp >> (e, t) and maps them into a stream (e, t2) where t2 is a future timestamp >> (up to 1 year in the future, which indicates when to schedule a >> transformation of e). I then want to key by e and keep track of the max t2 >> for each e. Now the tricky bit: I want to periodically, say every minute >> (in event time world) take all (e, t2) where t2 occurred in the last >> minute, do a transformation and emit the result. It is important that the >> final transformation happens after t2 (preferably as soon as possible, but >> a delay of minutes is fine). >> >> Is it possible to use Flink's windowing and watermark mechanics to >> achieve this? I want to maintain a large state for the (e, t2) window, e.g. >> over a year (probably too large to fit in memory). And somehow use >> watermarks to execute the scheduled transformations. >> >> If anyone has any views on how this could be done, (or whether it's even >> possible/a good idea to do) with Flink then it would be great to hear! >> >> Thanks, >> >> Josh >> >