@Matthias: I think using the KeyedDataStream will simply result in smaller programs. May be hard for some users to make the connection to a 1-element-tumbling-window, simply because they want to use state. Not everyone is a deep into that stuff as you are ;-)
On Sun, Jun 28, 2015 at 1:13 AM, Matthias J. Sax < mj...@informatik.hu-berlin.de> wrote: > Yes. But as I said, you can get the same behavior with a > GroupedDataStream using a tumbling 1-tuple-size window. Thus, there is > no conceptual advantage in using KeyedDataStream and no disadvantage in > binding stateful operations to GroupedDataStreams. > > On 06/27/2015 06:54 PM, Márton Balassi wrote: > > @Matthias: Your point of working with a minimal number of clear concepts > is > > desirable to say the least. :) > > > > The reasoning behind the KeyedDatastream is to associate Flink persisted > > operator state with the keys of the data that produced it, so that > stateful > > computation becomes scalabe in the future. This should not be tied to the > > GroupedDataStream, especially not if we are removing the option to create > > groups without windows as proposed on the Wiki by Stephan. > > > > On Sat, Jun 27, 2015 at 4:15 PM, Matthias J. Sax < > > mj...@informatik.hu-berlin.de> wrote: > > > >> This was more a conceptual point-of-view argument. From an > >> implementation point of view, skipping the window building step is a > >> good idea if a tumbling 1-tuple-size window is detected. > >> > >> I prefer to work with a minimum number of concepts (and apply internal > >> optimization if possible) instead of using redundant concepts for > >> special cases. Of course, this is my personal point of view. > >> > >> -Matthias > >> > >> On 06/27/2015 03:47 PM, Aljoscha Krettek wrote: > >>> What do you mean by Comment 2? Using the whole window apparatus if you > >> just > >>> want to have, for example, a simple partitioned filter with partitioned > >>> state seems a bit extravagant. > >>> > >>> On Sat, 27 Jun 2015 at 15:19 Matthias J. Sax < > >> mj...@informatik.hu-berlin.de> > >>> wrote: > >>> > >>>> Nice starting point. > >>>> > >>>> Comment 1: > >>>> "Each individual stream partition delivers elements strictly in > order." > >>>> (in 'Parallel Streams, Partitions, Time, and Ordering') > >>>> > >>>> I would say "FIFO" and not "strictly in order". If data is not emitted > >>>> in-order, the stream partition will not be in-order either. > >>>> > >>>> Comment 2: > >>>> Why do we need KeyedDataStream. You can get everything done with > >>>> GroupedDataStream (using a tumbling window of size = 1 tuple). > >>>> > >>>> > >>>> -Matthias > >>>> > >>>> On 06/26/2015 07:42 PM, Stephan Ewen wrote: > >>>>> Here is a first bit of what I have been writing down. Will add more > >> over > >>>>> the next days: > >>>>> > >>>>> > >>>>> https://cwiki.apache.org/confluence/display/FLINK/Stream+Windows > >>>>> > >>>>> > >>>> > >> > https://cwiki.apache.org/confluence/display/FLINK/Parallel+Streams%2C+Partitions%2C+Time%2C+and+Ordering > >>>>> > >>>>> > >>>>> > >>>>> On Thu, Jun 25, 2015 at 6:35 PM, Paris Carbone <par...@kth.se> > wrote: > >>>>> > >>>>>> +1 for writing this down > >>>>>> > >>>>>>> On 25 Jun 2015, at 18:11, Aljoscha Krettek <aljos...@apache.org> > >>>> wrote: > >>>>>>> > >>>>>>> +1 go ahead > >>>>>>> > >>>>>>> On Thu, 25 Jun 2015 at 18:02 Stephan Ewen <se...@apache.org> > wrote: > >>>>>>> > >>>>>>>> Hey! > >>>>>>>> > >>>>>>>> This thread covers many different topics. Lets break this up into > >>>>>> separate > >>>>>>>> discussions. > >>>>>>>> > >>>>>>>> - Operator State is already driven by Gyula and Paris and > happening > >> on > >>>>>> the > >>>>>>>> above mentioned pull request and the followup discussions. > >>>>>>>> > >>>>>>>> - For windowing, this discussion has brought some results that we > >>>> should > >>>>>>>> sum up and clearly write down. > >>>>>>>> I would like to chime in to do that based on what I learned from > >> the > >>>>>>>> document and this discussion. I also got some input from Marton > >> about > >>>>>> what > >>>>>>>> he learned from mapping the Cloud DataFlow constructs to Flink. > >>>>>>>> I'll draft a Wiki page (with the help of Aljoscha, Marton) that > >> sums > >>>>>>>> this up and documents it for users (if we decide to adopt this). > >>>>>>>> Then we run this by Gyula, Matthias Sax and Kostas for feedback. > >>>>>>>> > >>>>>>>> - API style discussions should be yet another thread. This will > >>>> probably > >>>>>>>> be opened as people start to address that. > >>>>>>>> > >>>>>>>> > >>>>>>>> I'll try to get a draft of the wiki version out tomorrow noon and > >> send > >>>>>> the > >>>>>>>> link around. > >>>>>>>> > >>>>>>>> Greetings, > >>>>>>>> Stephan > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> On Thu, Jun 25, 2015 at 3:51 PM, Matthias J. Sax < > >>>>>>>> mj...@informatik.hu-berlin.de> wrote: > >>>>>>>> > >>>>>>>>> Sure. I picked this up. Using the current model for "occurrence > >> time > >>>>>>>>> semantics" does not work. > >>>>>>>>> > >>>>>>>>> I elaborated on this in the past many times (but nobody cared). > It > >> is > >>>>>>>>> important to make it clear to the user what semantics are > >> supported. > >>>>>>>>> Claiming to support "sliding windows" doesn't mean anything; > there > >>>> are > >>>>>>>>> too many different semantics out there. :) > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> On 06/25/2015 03:35 PM, Aljoscha Krettek wrote: > >>>>>>>>>> Yes, I am aware of this requirement and it would also be > supported > >>>> in > >>>>>>>> my > >>>>>>>>>> proposed model. > >>>>>>>>>> > >>>>>>>>>> The problem is, that the "custom timestamp" feature gives the > >>>>>>>> impression > >>>>>>>>>> that the elements would be windowed according to a > user-timestamp. > >>>> The > >>>>>>>>>> results, however, are wrong because of the assumption about > >> elements > >>>>>>>>>> arriving in order. (This is what I was trying to show with my > >> fancy > >>>>>>>> ASCII > >>>>>>>>>> art and result output. > >>>>>>>>>> > >>>>>>>>>> On Thu, 25 Jun 2015 at 15:26 Matthias J. Sax < > >>>>>>>>> mj...@informatik.hu-berlin.de> > >>>>>>>>>> wrote: > >>>>>>>>>> > >>>>>>>>>>> Hi Aljoscha, > >>>>>>>>>>> > >>>>>>>>>>> I like that you are pushing in this direction. However, IMHO > you > >>>>>>>>>>> misinterpreter the current approach. It does not assume that > >> tuples > >>>>>>>>>>> arrive in-order; the current approach has no notion about a > >>>>>>>>>>> pre-defined-order (for example, the order in which the event > are > >>>>>>>>>>> created). There is only the notion of "arrival-order" at the > >>>>>> operator. > >>>>>>>>>>> From this "arrival-order" perspective, the result are > correct(!). > >>>>>>>>>>> > >>>>>>>>>>> Windowing in the current approach means for example, "sum up an > >>>>>>>>>>> attribute of all events you *received* in the last 5 seconds". > >> That > >>>>>>>> is a > >>>>>>>>>>> different meaning that "sum up an attribute of all event that > >>>>>>>> *occurred* > >>>>>>>>>>> in the last 5 seconds". Both queries are valid and Flink should > >>>>>>>> support > >>>>>>>>>>> both IMHO. > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> -Matthias > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> On 06/25/2015 03:03 PM, Aljoscha Krettek wrote: > >>>>>>>>>>>> Yes, now this also processes about 3 mio Elements (Window > Size 5 > >>>>>> sec, > >>>>>>>>>>> Slide > >>>>>>>>>>>> 1 sec) but it still fluctuates a lot between 1 mio. and 5 mio. > >>>>>>>>>>>> > >>>>>>>>>>>> Performance is not my main concern, however. My concern is > that > >>>> the > >>>>>>>>>>> current > >>>>>>>>>>>> model assumes elements to arrive in order, which is simply not > >>>> true. > >>>>>>>>>>>> > >>>>>>>>>>>> In your code you have these lines for specifying the window: > >>>>>>>>>>>> .window(Time.of(1l, TimeUnit.SECONDS)) > >>>>>>>>>>>> .every(Time.of(1l, TimeUnit.SECONDS)) > >>>>>>>>>>>> > >>>>>>>>>>>> Although this semantically specifies a tumbling window of > size 1 > >>>> sec > >>>>>>>>> I'm > >>>>>>>>>>>> afraid it uses the sliding window logic internally (because of > >> the > >>>>>>>>>>>> .every()). > >>>>>>>>>>>> > >>>>>>>>>>>> In my tests I only have the first line. > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> On Thu, 25 Jun 2015 at 14:32 Gábor Gévay <gga...@gmail.com> > >>>> wrote: > >>>>>>>>>>>> > >>>>>>>>>>>>> I'm very sorry, I had a bug in the InversePreReducer. It > should > >>>> be > >>>>>>>>>>>>> fixed now. Can you please run it again? > >>>>>>>>>>>>> > >>>>>>>>>>>>> I also tried to reproduce some of your performance numbers, > but > >>>> I'm > >>>>>>>>>>>>> getting only less than 1/10th of yours. For example, in the > >>>>>> Tumbling > >>>>>>>>>>>>> case, Current/Reduce produces only ~100000 for me. Do you > have > >>>> any > >>>>>>>>>>>>> idea what I could be doing wrong? My code: > >>>>>>>>>>>>> http://pastebin.com/zbEjmGhk > >>>>>>>>>>>>> I am running it on a 2 GHz Core i7. > >>>>>>>>>>>>> > >>>>>>>>>>>>> Best regards, > >>>>>>>>>>>>> Gabor > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> 2015-06-25 12:31 GMT+02:00 Aljoscha Krettek < > >> aljos...@apache.org > >>>>> : > >>>>>>>>>>>>>> Hi, > >>>>>>>>>>>>>> I also ran the tests on top of PR 856 (inverse reducer) now. > >> The > >>>>>>>>>>> results > >>>>>>>>>>>>>> seem incorrect. When I insert a Thread.sleep(1) in the tuple > >>>>>>>> source, > >>>>>>>>>>> all > >>>>>>>>>>>>>> the previous tests reported around 3600 tuples (Size 5 sec, > >>>> Slide > >>>>>> 1 > >>>>>>>>>>> sec) > >>>>>>>>>>>>>> (Theoretically there would be 5000 tuples in 5 seconds but > >> this > >>>> is > >>>>>>>>> due > >>>>>>>>>>> to > >>>>>>>>>>>>>> overhead). These are the results for the inverse reduce > >>>>>>>> optimisation: > >>>>>>>>>>>>>> (Tuple 0,38) > >>>>>>>>>>>>>> (Tuple 0,829) > >>>>>>>>>>>>>> (Tuple 0,1625) > >>>>>>>>>>>>>> (Tuple 0,2424) > >>>>>>>>>>>>>> (Tuple 0,3190) > >>>>>>>>>>>>>> (Tuple 0,3198) > >>>>>>>>>>>>>> (Tuple 0,-339368) > >>>>>>>>>>>>>> (Tuple 0,-1315725) > >>>>>>>>>>>>>> (Tuple 0,-2932932) > >>>>>>>>>>>>>> (Tuple 0,-5082735) > >>>>>>>>>>>>>> (Tuple 0,-7743256) > >>>>>>>>>>>>>> (Tuple 0,75701046) > >>>>>>>>>>>>>> (Tuple 0,642829470) > >>>>>>>>>>>>>> (Tuple 0,2242018381) > >>>>>>>>>>>>>> (Tuple 0,5190708618) > >>>>>>>>>>>>>> (Tuple 0,10060360311) > >>>>>>>>>>>>>> (Tuple 0,-94254951) > >>>>>>>>>>>>>> (Tuple 0,-219806321293) > >>>>>>>>>>>>>> (Tuple 0,-1258895232699) > >>>>>>>>>>>>>> (Tuple 0,-4074432596329) > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> One line is one emitted window count. This is what happens > >> when > >>>> I > >>>>>>>>>>> remove > >>>>>>>>>>>>>> the Thread.sleep(1): > >>>>>>>>>>>>>> (Tuple 0,660676) > >>>>>>>>>>>>>> (Tuple 0,2553733) > >>>>>>>>>>>>>> (Tuple 0,3542696) > >>>>>>>>>>>>>> (Tuple 0,1) > >>>>>>>>>>>>>> (Tuple 0,1107035) > >>>>>>>>>>>>>> (Tuple 0,2549491) > >>>>>>>>>>>>>> (Tuple 0,4100387) > >>>>>>>>>>>>>> (Tuple 0,-8406583360092) > >>>>>>>>>>>>>> (Tuple 0,-8406582150743) > >>>>>>>>>>>>>> (Tuple 0,-8406580427190) > >>>>>>>>>>>>>> (Tuple 0,-8406580427190) > >>>>>>>>>>>>>> (Tuple 0,-8406580427190) > >>>>>>>>>>>>>> (Tuple 0,6847279255682044995) > >>>>>>>>>>>>>> (Tuple 0,6847279255682044995) > >>>>>>>>>>>>>> (Tuple 0,-5390528042713628318) > >>>>>>>>>>>>>> (Tuple 0,-5390528042711551780) > >>>>>>>>>>>>>> (Tuple 0,-5390528042711551780) > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> So at some point the pre-reducer seems to go haywire and > does > >>>> not > >>>>>>>>>>> recover > >>>>>>>>>>>>>> from it. The good thing is that it does produce results now, > >>>> where > >>>>>>>>> the > >>>>>>>>>>>>>> previous Current/Reduce would simply hang and not produce > any > >>>>>>>> output. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> On Thu, 25 Jun 2015 at 12:02 Gábor Gévay <gga...@gmail.com> > >>>>>> wrote: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Hello, > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Aljoscha, can you please try the performance test of > >>>>>>>> Current/Reduce > >>>>>>>>>>>>>>> with the InversePreReducer in PR 856? (If you just call > sum, > >> it > >>>>>>>> will > >>>>>>>>>>>>>>> use an InversePreReducer.) It would be an interesting test, > >>>>>>>> because > >>>>>>>>>>>>>>> the inverse function optimization really depends on the > >> stream > >>>>>>>> being > >>>>>>>>>>>>>>> ordered, and I think it has the potential of being faster > >> then > >>>>>>>>>>>>>>> Next/Reduce. Especially if the window size is much larger > >> than > >>>>>> the > >>>>>>>>>>>>>>> slide size. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Best regards, > >>>>>>>>>>>>>>> Gabor > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> 2015-06-25 11:36 GMT+02:00 Aljoscha Krettek < > >>>> aljos...@apache.org > >>>>>>>>> : > >>>>>>>>>>>>>>>> I think I'll have to elaborate a bit so I created a > >>>>>>>>> proof-of-concept > >>>>>>>>>>>>>>>> implementation of my Ideas and ran some throughput > >>>> measurements > >>>>>>>> to > >>>>>>>>>>>>>>>> alleviate concerns about performance. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> First, though, I want to highlight again why the current > >>>>>> approach > >>>>>>>>>>> does > >>>>>>>>>>>>>>> not > >>>>>>>>>>>>>>>> work with out-of-order elements (which, again, occur > >>>> constantly > >>>>>>>> due > >>>>>>>>>>> to > >>>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>> distributed nature of the system). This is the example I > >>>> posted > >>>>>>>>>>>>> earlier: > >>>>>>>>>>>>>>>> https://gist.github.com/aljoscha/a367012646ab98208d27. > The > >>>> plan > >>>>>>>>>>> looks > >>>>>>>>>>>>>>> like > >>>>>>>>>>>>>>>> this: > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> +--+ > >>>>>>>>>>>>>>>> | | Source > >>>>>>>>>>>>>>>> +--+ > >>>>>>>>>>>>>>>> | > >>>>>>>>>>>>>>>> +-----+ > >>>>>>>>>>>>>>>> | | > >>>>>>>>>>>>>>>> | +--+ > >>>>>>>>>>>>>>>> | | | Identity Map > >>>>>>>>>>>>>>>> | +--+ > >>>>>>>>>>>>>>>> | | > >>>>>>>>>>>>>>>> +-----+ > >>>>>>>>>>>>>>>> | > >>>>>>>>>>>>>>>> +--+ > >>>>>>>>>>>>>>>> | | Window > >>>>>>>>>>>>>>>> +--+ > >>>>>>>>>>>>>>>> | > >>>>>>>>>>>>>>>> | > >>>>>>>>>>>>>>>> +--+ > >>>>>>>>>>>>>>>> | | Sink > >>>>>>>>>>>>>>>> +--+ > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> So all it does is pass the elements through an identity > map > >>>> and > >>>>>>>>> then > >>>>>>>>>>>>>>> merge > >>>>>>>>>>>>>>>> them again before the window operator. The source emits > >>>>>> ascending > >>>>>>>>>>>>>>> integers > >>>>>>>>>>>>>>>> and the window operator has a custom timestamp extractor > >> that > >>>>>>>> uses > >>>>>>>>>>> the > >>>>>>>>>>>>>>>> integer itself as the timestamp and should create windows > of > >>>>>>>> size 4 > >>>>>>>>>>>>> (that > >>>>>>>>>>>>>>>> is elements with timestamp 0-3 are one window, the next > are > >>>> the > >>>>>>>>>>>>> elements > >>>>>>>>>>>>>>>> with timestamp 4-8, and so on). Since the topology > basically > >>>>>>>>> doubles > >>>>>>>>>>>>> the > >>>>>>>>>>>>>>>> elements form the source I would expect to get these > >> windows: > >>>>>>>>>>>>>>>> Window: 0, 0, 1, 1, 2, 2, 3, 3 > >>>>>>>>>>>>>>>> Window: 4, 4, 6, 6, 7, 7, 8, 8 > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> The output is this, however: > >>>>>>>>>>>>>>>> Window: 0, 1, 2, 3, > >>>>>>>>>>>>>>>> Window: 4, 0, 1, 2, 3, 4, 5, 6, 7, > >>>>>>>>>>>>>>>> Window: 8, 9, 10, 11, > >>>>>>>>>>>>>>>> Window: 12, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, > >>>>>>>>>>>>>>>> Window: 16, 17, 18, 19, > >>>>>>>>>>>>>>>> Window: 20, 21, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, > 23, > >>>>>>>>>>>>>>>> Window: 24, 25, 26, 27, > >>>>>>>>>>>>>>>> Window: 28, 29, 30, 22, 23, 24, 25, 26, 27, 28, 29, 30, > 31, > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> The reason is that the elements simply arrive > out-of-order. > >>>>>>>> Imagine > >>>>>>>>>>>>> what > >>>>>>>>>>>>>>>> would happen if the elements actually arrived with some > >> delay > >>>>>>>> from > >>>>>>>>>>>>>>>> different operations. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Now, on to the performance numbers. The proof-of-concept I > >>>>>>>> created > >>>>>>>>> is > >>>>>>>>>>>>>>>> available here: > >>>>>>>>>>>>>>>> > >>>>>> https://github.com/aljoscha/flink/tree/event-time-window-fn-mock > >>>>>>>> . > >>>>>>>>>>> The > >>>>>>>>>>>>>>> basic > >>>>>>>>>>>>>>>> idea is that sources assign the current timestamp when > >>>> emitting > >>>>>>>>>>>>> elements. > >>>>>>>>>>>>>>>> They also periodically emit watermarks that tell us that > no > >>>>>>>>> elements > >>>>>>>>>>>>> with > >>>>>>>>>>>>>>>> an earlier timestamp will be emitted. The watermarks > >> propagate > >>>>>>>>>>> through > >>>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>> operators. The window operator looks at the timestamp of > an > >>>>>>>> element > >>>>>>>>>>>>> and > >>>>>>>>>>>>>>>> puts it into the buffer that corresponds to that window. > >> When > >>>>>> the > >>>>>>>>>>>>> window > >>>>>>>>>>>>>>>> operator receives a watermark it will look at the > in-flight > >>>>>>>> windows > >>>>>>>>>>>>>>>> (basically the buffers) and emit those windows where the > >>>>>>>> window-end > >>>>>>>>>>> is > >>>>>>>>>>>>>>>> before the watermark. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> For measuring throughput I did the following: The source > >> emits > >>>>>>>>> tuples > >>>>>>>>>>>>> of > >>>>>>>>>>>>>>>> the form ("tuple", 1) in an infinite loop. The window > >> operator > >>>>>>>> sums > >>>>>>>>>>> up > >>>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>> tuples, thereby counting how many tuples the window > operator > >>>> can > >>>>>>>>>>>>> handle > >>>>>>>>>>>>>>> in > >>>>>>>>>>>>>>>> a given time window. There are two different > implementations > >>>> for > >>>>>>>>> the > >>>>>>>>>>>>>>>> summation: 1) simply summing up the values in a > mapWindow(), > >>>>>>>> there > >>>>>>>>>>> you > >>>>>>>>>>>>>>> get > >>>>>>>>>>>>>>>> a List of all tuples and simple iterate over it. 2) using > >>>>>> sum(1), > >>>>>>>>>>>>> which > >>>>>>>>>>>>>>> is > >>>>>>>>>>>>>>>> implemented as a reduce() (that uses the pre-reducer > >>>>>>>>> optimisations). > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> These are the performance numbers (Current is the current > >>>>>>>>>>>>> implementation, > >>>>>>>>>>>>>>>> Next is my proof-of-concept): > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Tumbling (1 sec): > >>>>>>>>>>>>>>>> - Current/Map: 1.6 mio > >>>>>>>>>>>>>>>> - Current/Reduce: 2 mio > >>>>>>>>>>>>>>>> - Next/Map: 2.2 mio > >>>>>>>>>>>>>>>> - Next/Reduce: 4 mio > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Sliding (5 sec, slide 1 sec): > >>>>>>>>>>>>>>>> - Current/Map: ca 3 mio (fluctuates a lot) > >>>>>>>>>>>>>>>> - Current/Reduce: No output > >>>>>>>>>>>>>>>> - Next/Map: ca 4 mio (fluctuates) > >>>>>>>>>>>>>>>> - Next/Reduce: 10 mio > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> The Next/Reduce variant can basically scale indefinitely > >> with > >>>>>>>>> window > >>>>>>>>>>>>> size > >>>>>>>>>>>>>>>> because the internal state does not rely on the number of > >>>>>>>> elements > >>>>>>>>>>>>> (it is > >>>>>>>>>>>>>>>> just the current sum). The pre-reducer for sliding > elements > >>>>>>>> cannot > >>>>>>>>>>>>> handle > >>>>>>>>>>>>>>>> the amount of tuples, it produces no output. For the two > Map > >>>>>>>>> variants > >>>>>>>>>>>>> the > >>>>>>>>>>>>>>>> performance fluctuates because they always keep all the > >>>> elements > >>>>>>>> in > >>>>>>>>>>> an > >>>>>>>>>>>>>>>> internal buffer before emission, this seems to tax the > >> garbage > >>>>>>>>>>>>> collector > >>>>>>>>>>>>>>> a > >>>>>>>>>>>>>>>> bit and leads to random pauses. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> One thing that should be noted is that I had to disable > the > >>>>>>>>>>>>> fake-element > >>>>>>>>>>>>>>>> emission thread, otherwise the Current versions would > >>>> deadlock. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> So, I started working on this because I thought that > >>>>>> out-of-order > >>>>>>>>>>>>>>>> processing would be necessary for correctness. And it is > >>>>>>>> certainly, > >>>>>>>>>>>>> But > >>>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>> proof-of-concept also shows that performance can be > greatly > >>>>>>>>> improved. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> On Wed, 24 Jun 2015 at 09:46 Gyula Fóra < > >> gyula.f...@gmail.com > >>>>> > >>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> I agree lets separate these topics from each other so we > >> can > >>>>>> get > >>>>>>>>>>>>> faster > >>>>>>>>>>>>>>>>> resolution. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> There is already a state discussion in the thread we > >> started > >>>>>>>> with > >>>>>>>>>>>>> Paris. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> On Wed, Jun 24, 2015 at 9:24 AM Kostas Tzoumas < > >>>>>>>>> ktzou...@apache.org > >>>>>>>>>>>> > >>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> I agree with supporting out-of-order out of the box :-), > >>>> even > >>>>>>>> if > >>>>>>>>>>>>> this > >>>>>>>>>>>>>>>> means > >>>>>>>>>>>>>>>>>> a major refactoring. This is the right time to refactor > >> the > >>>>>>>>>>>>> streaming > >>>>>>>>>>>>>>>> API > >>>>>>>>>>>>>>>>>> before we pull it out of beta. I think that this is more > >>>>>>>>> important > >>>>>>>>>>>>>>> than > >>>>>>>>>>>>>>>> new > >>>>>>>>>>>>>>>>>> features in the streaming API, which can be prioritized > >> once > >>>>>>>> the > >>>>>>>>>>>>> API > >>>>>>>>>>>>>>> is > >>>>>>>>>>>>>>>> out > >>>>>>>>>>>>>>>>>> of beta (meaning, that IMO this is the right time to > stall > >>>> PRs > >>>>>>>>>>>>> until > >>>>>>>>>>>>>>> we > >>>>>>>>>>>>>>>>>> agree on the design). > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> There are three sections in the document: windowing, > >> state, > >>>>>> and > >>>>>>>>>>>>> API. > >>>>>>>>>>>>>>> How > >>>>>>>>>>>>>>>>>> convoluted are those with each other? Can we separate > the > >>>>>>>>>>>>> discussion > >>>>>>>>>>>>>>> or > >>>>>>>>>>>>>>>> do > >>>>>>>>>>>>>>>>>> we need to discuss those all together? I think part of > the > >>>>>>>>>>>>> difficulty > >>>>>>>>>>>>>>> is > >>>>>>>>>>>>>>>>>> that we are discussing three design choices at once. > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> On Tue, Jun 23, 2015 at 7:43 PM, Ted Dunning < > >>>>>>>>>>>>> ted.dunn...@gmail.com> > >>>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> Out of order is ubiquitous in the real-world. > Typically, > >>>>>> what > >>>>>>>>>>>>>>>> happens is > >>>>>>>>>>>>>>>>>>> that businesses will declare a maximum allowable delay > >> for > >>>>>>>>>>>>> delayed > >>>>>>>>>>>>>>>>>>> transactions and will commit to results when that delay > >> is > >>>>>>>>>>>>> reached. > >>>>>>>>>>>>>>>>>>> Transactions that arrive later than this cutoff are > >>>> collected > >>>>>>>>>>>>>>>> specially > >>>>>>>>>>>>>>>>>> as > >>>>>>>>>>>>>>>>>>> corrections which are reported/used when possible. > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> Clearly, ordering can also be violated during > processing, > >>>> but > >>>>>>>> if > >>>>>>>>>>>>> the > >>>>>>>>>>>>>>>> data > >>>>>>>>>>>>>>>>>>> is originally out of order the situation can't be > >> repaired > >>>> by > >>>>>>>>> any > >>>>>>>>>>>>>>>>>> protocol > >>>>>>>>>>>>>>>>>>> fixes that prevent transactions from becoming > disordered > >>>> but > >>>>>>>> has > >>>>>>>>>>>>> to > >>>>>>>>>>>>>>>>>> handled > >>>>>>>>>>>>>>>>>>> at the data level. > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> On Tue, Jun 23, 2015 at 9:29 AM, Aljoscha Krettek < > >>>>>>>>>>>>>>> aljos...@apache.org > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> I also don't like big changes but sometimes they are > >>>>>>>> necessary. > >>>>>>>>>>>>>>> The > >>>>>>>>>>>>>>>>>>> reason > >>>>>>>>>>>>>>>>>>>> why I'm so adamant about out-of-order processing is > that > >>>>>>>>>>>>>>>> out-of-order > >>>>>>>>>>>>>>>>>>>> elements are not some exception that occurs once in a > >>>> while; > >>>>>>>>>>>>> they > >>>>>>>>>>>>>>>> occur > >>>>>>>>>>>>>>>>>>>> constantly in a distributed system. For example, in > >> this: > >>>>>>>>>>>>>>>>>>>> https://gist.github.com/aljoscha/a367012646ab98208d27 > >> the > >>>>>>>>>>>>>>> resulting > >>>>>>>>>>>>>>>>>>>> windows > >>>>>>>>>>>>>>>>>>>> are completely bogus because the current windowing > >> system > >>>>>>>>>>>>> assumes > >>>>>>>>>>>>>>>>>>> elements > >>>>>>>>>>>>>>>>>>>> to globally arrive in order, which is simply not true. > >>>> (The > >>>>>>>>>>>>>>> example > >>>>>>>>>>>>>>>>>> has a > >>>>>>>>>>>>>>>>>>>> source that generates increasing integers. Then these > >> pass > >>>>>>>>>>>>>>> through a > >>>>>>>>>>>>>>>>>> map > >>>>>>>>>>>>>>>>>>>> and are unioned with the original DataStream before a > >>>> window > >>>>>>>>>>>>>>>> operator.) > >>>>>>>>>>>>>>>>>>>> This simulates elements arriving from different > >> operators > >>>> at > >>>>>>>> a > >>>>>>>>>>>>>>>>>> windowing > >>>>>>>>>>>>>>>>>>>> operator. The example is also DOP=1, I imagine this to > >> get > >>>>>>>>>>>>> worse > >>>>>>>>>>>>>>>> with > >>>>>>>>>>>>>>>>>>>> higher DOP. > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> What do you mean by costly? As I said, I have a > >>>>>>>>>>>>> proof-of-concept > >>>>>>>>>>>>>>>>>>> windowing > >>>>>>>>>>>>>>>>>>>> operator that can handle out-or-order elements. This > is > >> an > >>>>>>>>>>>>> example > >>>>>>>>>>>>>>>>>> using > >>>>>>>>>>>>>>>>>>>> the current Flink API: > >>>>>>>>>>>>>>>>>>>> https://gist.github.com/aljoscha/f8dce0691732e344bbe8 > . > >>>>>>>>>>>>>>>>>>>> (It is an infinite source of tuples and a 5 second > >> window > >>>>>>>>>>>>> operator > >>>>>>>>>>>>>>>> that > >>>>>>>>>>>>>>>>>>>> counts the tuples.) The first problem is that this > code > >>>>>>>>>>>>> deadlocks > >>>>>>>>>>>>>>>>>> because > >>>>>>>>>>>>>>>>>>>> of the thread that emits fake elements. If I disable > the > >>>>>> fake > >>>>>>>>>>>>>>>> element > >>>>>>>>>>>>>>>>>>> code > >>>>>>>>>>>>>>>>>>>> it works, but the throughput using my mockup is 4 > times > >>>>>>>> higher > >>>>>>>>>>>>> . > >>>>>>>>>>>>>>> The > >>>>>>>>>>>>>>>>>> gap > >>>>>>>>>>>>>>>>>>>> widens dramatically if the window size increases. > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> So, it actually increases performance (unless I'm > >> making a > >>>>>>>>>>>>> mistake > >>>>>>>>>>>>>>>> in > >>>>>>>>>>>>>>>>>> my > >>>>>>>>>>>>>>>>>>>> explorations) and can handle elements that arrive > >>>>>>>> out-of-order > >>>>>>>>>>>>>>>> (which > >>>>>>>>>>>>>>>>>>>> happens basically always in a real-world windowing > >>>>>>>> use-cases). > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> On Tue, 23 Jun 2015 at 12:51 Stephan Ewen < > >>>> se...@apache.org > >>>>>>> > >>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> What I like a lot about Aljoscha's proposed design is > >>>> that > >>>>>>>> we > >>>>>>>>>>>>>>>> need no > >>>>>>>>>>>>>>>>>>>>> different code for "system time" vs. "event time". It > >>>> only > >>>>>>>>>>>>>>>> differs in > >>>>>>>>>>>>>>>>>>>> where > >>>>>>>>>>>>>>>>>>>>> the timestamps are assigned. > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> The OOP approach also gives you the semantics of > total > >>>>>>>>>>>>> ordering > >>>>>>>>>>>>>>>>>> without > >>>>>>>>>>>>>>>>>>>>> imposing merges on the streams. > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> On Tue, Jun 23, 2015 at 10:43 AM, Matthias J. Sax < > >>>>>>>>>>>>>>>>>>>>> mj...@informatik.hu-berlin.de> wrote: > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> I agree that there should be multiple alternatives > the > >>>>>>>>>>>>> user(!) > >>>>>>>>>>>>>>>> can > >>>>>>>>>>>>>>>>>>>>>> choose from. Partial out-of-order processing works > for > >>>>>>>>>>>>>>> many/most > >>>>>>>>>>>>>>>>>>>>>> aggregates. However, if you consider > >>>>>>>>>>>>> Event-Pattern-Matching, > >>>>>>>>>>>>>>>> global > >>>>>>>>>>>>>>>>>>>>>> ordering in necessary (even if the performance > penalty > >>>>>>>>>>>>> might > >>>>>>>>>>>>>>> be > >>>>>>>>>>>>>>>>>>> high). > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> I would also keep "system-time windows" as an > >>>> alternative > >>>>>>>>>>>>> to > >>>>>>>>>>>>>>>>>> "source > >>>>>>>>>>>>>>>>>>>>>> assigned ts-windows". > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> It might also be interesting to consider the > following > >>>>>>>>>>>>> paper > >>>>>>>>>>>>>>> for > >>>>>>>>>>>>>>>>>>>>>> overlapping windows: "Resource sharing in continuous > >>>>>>>>>>>>>>>> sliding-window > >>>>>>>>>>>>>>>>>>>>>> aggregates" > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> https://dl.acm.org/citation.cfm?id=1316720 > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> -Matthias > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> On 06/23/2015 10:37 AM, Gyula Fóra wrote: > >>>>>>>>>>>>>>>>>>>>>>> Hey > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> I think we should not block PRs unnecessarily if > your > >>>>>>>>>>>>>>>> suggested > >>>>>>>>>>>>>>>>>>>> changes > >>>>>>>>>>>>>>>>>>>>>>> might touch them at some point. > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> Also I still think we should not put everything in > >> the > >>>>>>>>>>>>>>>> Datastream > >>>>>>>>>>>>>>>>>>>>> because > >>>>>>>>>>>>>>>>>>>>>>> it will be a huge mess. > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> Also we need to agree on the out of order > processing, > >>>>>>>>>>>>>>> whether > >>>>>>>>>>>>>>>> we > >>>>>>>>>>>>>>>>>>> want > >>>>>>>>>>>>>>>>>>>>> it > >>>>>>>>>>>>>>>>>>>>>>> the way you proposed it(which is quite costly). > >> Another > >>>>>>>>>>>>>>>>>> alternative > >>>>>>>>>>>>>>>>>>>>>>> approach there which fits in the current windowing > is > >>>> to > >>>>>>>>>>>>>>>> filter > >>>>>>>>>>>>>>>>>> out > >>>>>>>>>>>>>>>>>>>> if > >>>>>>>>>>>>>>>>>>>>>>> order events and apply a special handling operator > on > >>>>>>>>>>>>> them. > >>>>>>>>>>>>>>>> This > >>>>>>>>>>>>>>>>>>>> would > >>>>>>>>>>>>>>>>>>>>> be > >>>>>>>>>>>>>>>>>>>>>>> fairly lightweight. > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> My point is that we need to consider some > alternative > >>>>>>>>>>>>>>>> solutions. > >>>>>>>>>>>>>>>>>>> And > >>>>>>>>>>>>>>>>>>>> we > >>>>>>>>>>>>>>>>>>>>>>> should not block contributions along the way. > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> Cheers > >>>>>>>>>>>>>>>>>>>>>>> Gyula > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> On Tue, Jun 23, 2015 at 9:55 AM Aljoscha Krettek < > >>>>>>>>>>>>>>>>>>>> aljos...@apache.org> > >>>>>>>>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> The reason I posted this now is that we need to > >> think > >>>>>>>>>>>>> about > >>>>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>>>> API > >>>>>>>>>>>>>>>>>>>>> and > >>>>>>>>>>>>>>>>>>>>>>>> windowing before proceeding with the PRs of Gabor > >>>>>>>>>>>>> (inverse > >>>>>>>>>>>>>>>>>> reduce) > >>>>>>>>>>>>>>>>>>>> and > >>>>>>>>>>>>>>>>>>>>>>>> Gyula (removal of "aggregate" functions on > >>>> DataStream). > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> For the windowing, I think that the current model > >> does > >>>>>>>>>>>>> not > >>>>>>>>>>>>>>>> work > >>>>>>>>>>>>>>>>>>> for > >>>>>>>>>>>>>>>>>>>>>>>> out-of-order processing. Therefore, the whole > >>>> windowing > >>>>>>>>>>>>>>>>>>>> infrastructure > >>>>>>>>>>>>>>>>>>>>>> will > >>>>>>>>>>>>>>>>>>>>>>>> basically have to be redone. Meaning also that any > >>>> work > >>>>>>>>>>>>> on > >>>>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>>>>>>>>> pre-aggregators or optimizations that we do now > >>>> becomes > >>>>>>>>>>>>>>>> useless. > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> For the API, I proposed to restructure the > >>>> interactions > >>>>>>>>>>>>>>>> between > >>>>>>>>>>>>>>>>>>> all > >>>>>>>>>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>>>>>>>>> different *DataStream classes and > >> grouping/windowing. > >>>>>>>>>>>>> (See > >>>>>>>>>>>>>>>> API > >>>>>>>>>>>>>>>>>>>> section > >>>>>>>>>>>>>>>>>>>>>> of > >>>>>>>>>>>>>>>>>>>>>>>> the doc I posted.) > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> On Mon, 22 Jun 2015 at 21:56 Gyula Fóra < > >>>>>>>>>>>>>>> gyula.f...@gmail.com > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>> Hi Aljoscha, > >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>> Thanks for the nice summary, this is a very good > >>>>>>>>>>>>>>> initiative. > >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>> I added some comments to the respective sections > >>>>>>>>>>>>> (where I > >>>>>>>>>>>>>>>> didnt > >>>>>>>>>>>>>>>>>>>> fully > >>>>>>>>>>>>>>>>>>>>>>>> agree > >>>>>>>>>>>>>>>>>>>>>>>>> :).). > >>>>>>>>>>>>>>>>>>>>>>>>> At some point I think it would be good to have a > >>>> public > >>>>>>>>>>>>>>>> hangout > >>>>>>>>>>>>>>>>>>>>> session > >>>>>>>>>>>>>>>>>>>>>>>> on > >>>>>>>>>>>>>>>>>>>>>>>>> this, which could make a more dynamic discussion. > >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>> Cheers, > >>>>>>>>>>>>>>>>>>>>>>>>> Gyula > >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>> Aljoscha Krettek <aljos...@apache.org> ezt írta > >>>>>>>>>>>>> (időpont: > >>>>>>>>>>>>>>>>>> 2015. > >>>>>>>>>>>>>>>>>>>> jún. > >>>>>>>>>>>>>>>>>>>>>>>> 22., > >>>>>>>>>>>>>>>>>>>>>>>>> H, 21:34): > >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> Hi, > >>>>>>>>>>>>>>>>>>>>>>>>>> with people proposing changes to the streaming > >> part > >>>> I > >>>>>>>>>>>>>>> also > >>>>>>>>>>>>>>>>>>> wanted > >>>>>>>>>>>>>>>>>>>> to > >>>>>>>>>>>>>>>>>>>>>>>>> throw > >>>>>>>>>>>>>>>>>>>>>>>>>> my hat into the ring. :D > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> During the last few months, while I was getting > >>>>>>>>>>>>>>> acquainted > >>>>>>>>>>>>>>>>>> with > >>>>>>>>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>>>>>>>>>>> streaming system, I wrote down some thoughts I > had > >>>>>>>>>>>>> about > >>>>>>>>>>>>>>>> how > >>>>>>>>>>>>>>>>>>>> things > >>>>>>>>>>>>>>>>>>>>>>>> could > >>>>>>>>>>>>>>>>>>>>>>>>>> be improved. Hopefully, they are in somewhat > >>>> coherent > >>>>>>>>>>>>>>> shape > >>>>>>>>>>>>>>>>>> now, > >>>>>>>>>>>>>>>>>>>> so > >>>>>>>>>>>>>>>>>>>>>>>>> please > >>>>>>>>>>>>>>>>>>>>>>>>>> have a look if you are interested in this: > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>> > >>>>>>>> > >>>>>> > >>>> > >> > https://docs.google.com/document/d/1rSoHyhUhm2IE30o5tkR8GEetjFvMRMNxvsCfoPsW6_4/edit?usp=sharing > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> This mostly covers: > >>>>>>>>>>>>>>>>>>>>>>>>>> - Timestamps assigned at sources > >>>>>>>>>>>>>>>>>>>>>>>>>> - Out-of-order processing of elements in window > >>>>>>>>>>>>>>> operators > >>>>>>>>>>>>>>>>>>>>>>>>>> - API design > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> Please let me know what you think. Comment in > the > >>>>>>>>>>>>>>> document > >>>>>>>>>>>>>>>> or > >>>>>>>>>>>>>>>>>>> here > >>>>>>>>>>>>>>>>>>>>> in > >>>>>>>>>>>>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>>>>>>>>>>> mailing list. > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> I have a PR in the makings that would introduce > >>>> source > >>>>>>>>>>>>>>>>>>> timestamps > >>>>>>>>>>>>>>>>>>>>> and > >>>>>>>>>>>>>>>>>>>>>>>>>> watermarks for keeping track of them. I also > >> hacked > >>>> a > >>>>>>>>>>>>>>>>>>>>> proof-of-concept > >>>>>>>>>>>>>>>>>>>>>>>>> of a > >>>>>>>>>>>>>>>>>>>>>>>>>> windowing system that is able to process > >>>> out-of-order > >>>>>>>>>>>>>>>> elements > >>>>>>>>>>>>>>>>>>>>> using a > >>>>>>>>>>>>>>>>>>>>>>>>>> FlatMap operator. (It uses panes to perform > >>>> efficient > >>>>>>>>>>>>>>>>>>>>>>>> pre-aggregations.) > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> Cheers, > >>>>>>>>>>>>>>>>>>>>>>>>>> Aljoscha > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>> > >>>>>> > >>>>>> > >>>>> > >>>> > >>>> > >>> > >> > >> > > > >