Looking into the code in TumblingEventTimeWindows:

@Override
public Collection<TimeWindow> assignWindows(Object element, long timestamp,
WindowAssignerContext context) {
if (timestamp > Long.MIN_VALUE) {
// Long.MIN_VALUE is currently assigned when no timestamp is present
long start = TimeWindow.getWindowStartWithOffset(timestamp, offset, size);
return Collections.singletonList(new TimeWindow(start, start + size));
} else {
throw new RuntimeException("Record has Long.MIN_VALUE timestamp (= no
timestamp marker). " +
"Is the time characteristic set to 'ProcessingTime', or did you forget to
call " +
"'DataStream.assignTimestampsAndWatermarks(...)'?");
}
}

So I think I can just write my own where the offset is derived from hashing
the element using my hash function.

Good plan or bad plan?


On Sun, 10 Feb 2019 at 19:55, Stephen Connolly <
stephen.alan.conno...@gmail.com> wrote:

> I would like to process a stream of data firom different customers,
> producing output say once every 15 minutes. The results will then be loaded
> into another system for stoage and querying.
>
> I have been using TumblingEventTimeWindows in my prototype, but I am
> concerned that all the windows will start and stop at the same time and
> cause batch load effects on the back-end data store.
>
> What I think I would like is that the windows could have a different start
> offset for each key, (using a hash function that I would supply)
>
> Thus deterministically, key "ca:fe:ba:be" would always start based on an
> initail offset of 00:07 UTC while say key "de:ad:be:ef" would always start
> based on an initial offset of say 00:02 UTC
>
> Is this possible? Or do I just have to find some way of queuing up my
> writes using back-pressure?
>
> Thanks in advance
>
> -stephenc
>
> P.S. I can trade assistance with Flink for assistance with Maven or
> Jenkins if my questions are too wierysome!
>

Reply via email to