Looking into the code in TumblingEventTimeWindows: @Override public Collection<TimeWindow> assignWindows(Object element, long timestamp, WindowAssignerContext context) { if (timestamp > Long.MIN_VALUE) { // Long.MIN_VALUE is currently assigned when no timestamp is present long start = TimeWindow.getWindowStartWithOffset(timestamp, offset, size); return Collections.singletonList(new TimeWindow(start, start + size)); } else { throw new RuntimeException("Record has Long.MIN_VALUE timestamp (= no timestamp marker). " + "Is the time characteristic set to 'ProcessingTime', or did you forget to call " + "'DataStream.assignTimestampsAndWatermarks(...)'?"); } }
So I think I can just write my own where the offset is derived from hashing the element using my hash function. Good plan or bad plan? On Sun, 10 Feb 2019 at 19:55, Stephen Connolly < stephen.alan.conno...@gmail.com> wrote: > I would like to process a stream of data firom different customers, > producing output say once every 15 minutes. The results will then be loaded > into another system for stoage and querying. > > I have been using TumblingEventTimeWindows in my prototype, but I am > concerned that all the windows will start and stop at the same time and > cause batch load effects on the back-end data store. > > What I think I would like is that the windows could have a different start > offset for each key, (using a hash function that I would supply) > > Thus deterministically, key "ca:fe:ba:be" would always start based on an > initail offset of 00:07 UTC while say key "de:ad:be:ef" would always start > based on an initial offset of say 00:02 UTC > > Is this possible? Or do I just have to find some way of queuing up my > writes using back-pressure? > > Thanks in advance > > -stephenc > > P.S. I can trade assistance with Flink for assistance with Maven or > Jenkins if my questions are too wierysome! >