Ok, thanks Aljoscha.

As an alternative to using Flink to maintain the schedule state, I could
take the (e, t2) stream and write to a external key-value store with a
bucket for each minute. Then have a separate service which polls the
key-value store every minute and retrieves the current bucket, and does the
final transformation.

I just thought there might be a nicer way to do it using Flink!

On Thu, Jun 9, 2016 at 2:23 PM, Aljoscha Krettek <aljos...@apache.org>
wrote:

> Hi Josh,
> I'll have to think a bit about that one. Once I have something I'll get
> back to you.
>
> Best,
> Aljoscha
>
> On Wed, 8 Jun 2016 at 21:47 Josh <jof...@gmail.com> wrote:
>
>> This is just a question about a potential use case for Flink:
>>
>> I have a Flink job which receives tuples with an event id and a timestamp
>> (e, t) and maps them into a stream (e, t2) where t2 is a future timestamp
>> (up to 1 year in the future, which indicates when to schedule a
>> transformation of e). I then want to key by e and keep track of the max t2
>> for each e. Now the tricky bit: I want to periodically, say every minute
>> (in event time world) take all (e, t2) where t2 occurred in the last
>> minute, do a transformation and emit the result. It is important that the
>> final transformation happens after t2 (preferably as soon as possible, but
>> a delay of minutes is fine).
>>
>> Is it possible to use Flink's windowing and watermark mechanics to
>> achieve this? I want to maintain a large state for the (e, t2) window, e.g.
>> over a year (probably too large to fit in memory). And somehow use
>> watermarks to execute the scheduled transformations.
>>
>> If anyone has any views on how this could be done, (or whether it's even
>> possible/a good idea to do) with Flink then it would be great to hear!
>>
>> Thanks,
>>
>> Josh
>>
>

Reply via email to