I agree that there should be multiple alternatives the user(!) can choose from. Partial out-of-order processing works for many/most aggregates. However, if you consider Event-Pattern-Matching, global ordering in necessary (even if the performance penalty might be high).
I would also keep "system-time windows" as an alternative to "source assigned ts-windows". It might also be interesting to consider the following paper for overlapping windows: "Resource sharing in continuous sliding-window aggregates" > https://dl.acm.org/citation.cfm?id=1316720 -Matthias On 06/23/2015 10:37 AM, Gyula Fóra wrote: > Hey > > I think we should not block PRs unnecessarily if your suggested changes > might touch them at some point. > > Also I still think we should not put everything in the Datastream because > it will be a huge mess. > > Also we need to agree on the out of order processing, whether we want it > the way you proposed it(which is quite costly). Another alternative > approach there which fits in the current windowing is to filter out if > order events and apply a special handling operator on them. This would be > fairly lightweight. > > My point is that we need to consider some alternative solutions. And we > should not block contributions along the way. > > Cheers > Gyula > > On Tue, Jun 23, 2015 at 9:55 AM Aljoscha Krettek <aljos...@apache.org> > wrote: > >> The reason I posted this now is that we need to think about the API and >> windowing before proceeding with the PRs of Gabor (inverse reduce) and >> Gyula (removal of "aggregate" functions on DataStream). >> >> For the windowing, I think that the current model does not work for >> out-of-order processing. Therefore, the whole windowing infrastructure will >> basically have to be redone. Meaning also that any work on the >> pre-aggregators or optimizations that we do now becomes useless. >> >> For the API, I proposed to restructure the interactions between all the >> different *DataStream classes and grouping/windowing. (See API section of >> the doc I posted.) >> >> On Mon, 22 Jun 2015 at 21:56 Gyula Fóra <gyula.f...@gmail.com> wrote: >> >>> Hi Aljoscha, >>> >>> Thanks for the nice summary, this is a very good initiative. >>> >>> I added some comments to the respective sections (where I didnt fully >> agree >>> :).). >>> At some point I think it would be good to have a public hangout session >> on >>> this, which could make a more dynamic discussion. >>> >>> Cheers, >>> Gyula >>> >>> Aljoscha Krettek <aljos...@apache.org> ezt írta (időpont: 2015. jún. >> 22., >>> H, 21:34): >>> >>>> Hi, >>>> with people proposing changes to the streaming part I also wanted to >>> throw >>>> my hat into the ring. :D >>>> >>>> During the last few months, while I was getting acquainted with the >>>> streaming system, I wrote down some thoughts I had about how things >> could >>>> be improved. Hopefully, they are in somewhat coherent shape now, so >>> please >>>> have a look if you are interested in this: >>>> >>>> >>> >> https://docs.google.com/document/d/1rSoHyhUhm2IE30o5tkR8GEetjFvMRMNxvsCfoPsW6_4/edit?usp=sharing >>>> >>>> This mostly covers: >>>> - Timestamps assigned at sources >>>> - Out-of-order processing of elements in window operators >>>> - API design >>>> >>>> Please let me know what you think. Comment in the document or here in >> the >>>> mailing list. >>>> >>>> I have a PR in the makings that would introduce source timestamps and >>>> watermarks for keeping track of them. I also hacked a proof-of-concept >>> of a >>>> windowing system that is able to process out-of-order elements using a >>>> FlatMap operator. (It uses panes to perform efficient >> pre-aggregations.) >>>> >>>> Cheers, >>>> Aljoscha >>>> >>> >> >
signature.asc
Description: OpenPGP digital signature