Hi, sliding windows replicate their records for each window. If you have use an incrementally aggregating function (ReduceFunction, AggregateFunction) with a sliding, the space requirement should not be an issue because each window stores a single value. However, this also means that each window performs its aggregations independently from the others. So, if you many concurrent sliding windows, pre-aggregate the records in a tumbling window can reduce the computational effort.
Best, Fabian 2017-12-12 8:10 GMT+01:00 Jinhua Luo <luajit...@gmail.com>: > Hi All, > > Given one stream source which generates 20k events/sec, and I need to > aggregate the element count using sliding window of 1 hour size. > > The problem is, the window may buffer too many elements (which may > cause a lot of block I/O because of checkpointing?), and in fact it > does not necessary to store them for one hour, because the elements > should get folded incrementally. But unlike Tumbling Window, the > sliding window would save elements for next window, right? > > So I am considering kind of workaround, should I chain two window like > below: > > .timeWindow(Time.minutes(1)) > ... > .timeWindow(Time.hours(1), Time.minutes(1)) > > Here the first window generate 1 minute aggregation units and the > second window provides the sliding output. > > Any suggestions? Thanks. >