@Radu, addition of more window types and sorting should be part of another design proposal. This is interesting stuff but I think we should keep issues separated because things can get complicated very quickly.
On Mon, 18 Jul 2016 at 12:32 Aljoscha Krettek <aljos...@apache.org> wrote: > Hi, > about TimeEvictor, yes, I think there should be specific evictors for > processing time and event time. Also, the current time should be > retrievable from the EvictorContext. > > For the wiki you will need permissions. This was recently changed because > there was too much spam. I gave you permission to add pages. Can you please > try and check if it works? > > Cheers, > Aljoscha > > On Fri, 15 Jul 2016 at 13:28 Vishnu Viswanath < > vishnu.viswanat...@gmail.com> wrote: > >> Hi all, >> >> How do we create a FLIP page, is there any permission setup required? I >> don't see any "Create" page(after logging in) option in the header as >> mentioned in >> >> https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals >> >> >> Thanks, >> Vishnu >> >> On Wed, Jul 13, 2016 at 10:22 PM, Vishnu Viswanath < >> vishnu.viswanat...@gmail.com> wrote: >> >> > Hi Aljoscha, >> > >> > I agree, the user will know exactly that they are creating an EventTime >> > based evictor or ProcessingTime based evictor looking at the code. >> > So do you think it will be ok to have multiple versions of TimeEvictor >> > (one for event time and one for processing time) and also a DeltaEvcitor >> > (again 2 versions- for event time and processing time) ? >> > >> > Please note that the existing behavior of TimeEvictor/DeltaEvictor does >> > not consider if it is EventTime or ProcessingTime >> > e.g., in TimeEvictor the current time is considered as the timestamp of >> > the last element in the window >> > >> > *long currentTime = Iterables.getLast(elements).getTimestamp();* >> > >> > not the highest timestamp of all elements >> > what I am trying to achieve is something like: >> > >> > *long currentTime;* >> > * if (ctx.isEventTime()) {* >> > * currentTime = getMaxTimestamp(elements);* >> > * } else {* >> > * currentTime = Iterables.getLast(elements).getTimestamp();* >> > * }* >> > >> > Similarly, in DeltaEvictor the *`lastElement`* is >> > *`Iterables.getLast(elements);`* and I am thinking we should consider >> the >> > element with max timestamp as the last element instead of just getting >> the >> > last inserted element as *`lastElement`* >> > >> > Do you think it is the right thing to do or leave the behavior Evictors >> as >> > is, w.r.t to choosing the last element? >> > >> > Thanks, >> > Vishnu >> > >> > On Wed, Jul 13, 2016 at 11:07 AM, Aljoscha Krettek <aljos...@apache.org >> > >> > wrote: >> > >> >> I still think it should be explicit in the class. For example, if you >> have >> >> this code: >> >> >> >> input >> >> .keyBy() >> >> .window() >> >> .trigger(EventTimeTrigger.create()) >> >> .evictor(TimeTrigger.create()) >> >> >> >> the time behavior of the trigger is explicitly specified while the >> evictor >> >> would dynamically adapt based on internal workings that the user might >> not >> >> be aware of. Having the behavior explicit at the call site is very >> >> important, in my opinion. >> >> >> >> On Wed, 13 Jul 2016 at 16:28 Vishnu Viswanath < >> >> vishnu.viswanat...@gmail.com> >> >> wrote: >> >> >> >> > Hi, >> >> > >> >> > I was hoping to use the isEventTime method in the WindowAssigner to >> set >> >> > that information in the EvictorContext. >> >> > What do you think?. >> >> > >> >> > Thanks and Regards, >> >> > Vishnu Viswanath, >> >> > >> >> > On Wed, Jul 13, 2016 at 10:09 AM, Aljoscha Krettek < >> aljos...@apache.org >> >> > >> >> > wrote: >> >> > >> >> > > Hi, >> >> > > I think the way to go here is to add both an EventTimeEvictor and a >> >> > > ProcessingTimeEvictor. The problem is that "isEventTime" cannot >> >> really be >> >> > > determined. That's also the reason why there is an EventTimeTrigger >> >> and a >> >> > > ProcessingTimeTrigger. It was just an oversight that the >> TimeEvictor >> >> does >> >> > > not also have these two versions. >> >> > > >> >> > > About EvictingWindowOperator, I think you can make the two methods >> >> > > non-final in WindowOperator, yes. >> >> > > >> >> > > Cheers, >> >> > > Aljoscha >> >> > > >> >> > > On Wed, 13 Jul 2016 at 14:32 Vishnu Viswanath < >> >> > > vishnu.viswanat...@gmail.com> >> >> > > wrote: >> >> > > >> >> > > > Hi Aljoscha, >> >> > > > >> >> > > > I am thinking of adding a method boolean isEventTime(); in the >> >> > > > EvictorContext apart from >> >> > > > >> >> > > > long getCurrentProcessingTime(); >> >> > > > MetricGroup getMetricGroup(); >> >> > > > long getCurrentWatermark(); >> >> > > > >> >> > > > This method can be used to make the Evictor not iterate through >> all >> >> the >> >> > > > elements in TimeEvictor. There will be a few changes in the >> existing >> >> > > > behavior of TimeEvictor and DeltaEvictor (I have mentioned this >> in >> >> the >> >> > > > design doc) >> >> > > > >> >> > > > Also, is there any specific reason why the open and close method >> in >> >> > > > WindowEvictor is made final? Since the EvictorContext will be in >> the >> >> > > > EvictingWindowOperator, I need to override the open and close in >> >> > > > EvitingWindowOperator to make the reference of EvictorContext >> null. >> >> > > > >> >> > > > Thanks and Regards, >> >> > > > Vishnu Viswanath, >> >> > > > >> >> > > > On Fri, Jul 8, 2016 at 7:40 PM, Vishnu Viswanath < >> >> > > > vishnu.viswanat...@gmail.com> wrote: >> >> > > > >> >> > > > My thought process when asking if we can use state backend in >> window >> >> > > > > function was : can we add the elements to be evicted into some >> >> state >> >> > > and >> >> > > > > allow the evictAfter to read it from some context and remove it >> >> from >> >> > > the >> >> > > > > window? >> >> > > > > >> >> > > > > >> >> > > > > On Fri, Jul 8, 2016 at 7:30 PM, Vishnu Viswanath < >> >> > > > > vishnu.viswanat...@gmail.com> wrote: >> >> > > > > >> >> > > > >> Hi Aljoscha, >> >> > > > >> >> >> > > > >> Thanks for the explanation, and sorry for late reply was busy >> >> with >> >> > > work. >> >> > > > >> >> >> > > > >> I did think about this scenario, in fact in my previous mail I >> >> > thought >> >> > > > of >> >> > > > >> posting this question, then I understood that this problem >> will >> >> be >> >> > > > >> there which ever method we choose(Trigger looking for pattern >> or >> >> > > Window >> >> > > > >> looking for pattern). >> >> > > > >> >> >> > > > >> I do have a pretty good watermark but my concern is that it >> >> changes >> >> > > > based >> >> > > > >> on the key of these messages(I don't know if it is possible, >> >> haven't >> >> > > > >> started coding that yet. May be you could tell me). Even if >> it is >> >> > yes >> >> > > > some >> >> > > > >> of these watermarks will be long(in days), which I don't want >> the >> >> > > > trigger >> >> > > > >> to wait that long. >> >> > > > >> >> >> > > > >> It looks like it is not easy to have an evictAfter based on >> >> window >> >> > > > >> function(without introducing coupling), but can the current >> >> window >> >> > > apply >> >> > > > >> function be modified to allow it to change the elements in it >> - >> >> may >> >> > be >> >> > > > >> using some state backend(I don't know how excatly the >> internals >> >> of >> >> > > these >> >> > > > >> work, so this might be a wrong question) >> >> > > > >> >> >> > > > >> Thanks and Regards, >> >> > > > >> Vishnu Viswanath, >> >> > > > >> >> >> > > > >> On Fri, Jul 8, 2016 at 10:20 AM, Aljoscha Krettek < >> >> > > aljos...@apache.org> >> >> > > > >> wrote: >> >> > > > >> >> >> > > > >>> Hi Vishnu, >> >> > > > >>> how long would these patterns be? The Trigger would not have >> to >> >> > sort >> >> > > > the >> >> > > > >>> elements for every new element but just insert the new >> element >> >> into >> >> > > an >> >> > > > >>> internal data structure. Only when it sees that the >> watermark is >> >> > > past a >> >> > > > >>> certain point would it check whether the pattern matches and >> >> > actually >> >> > > > >>> Trigger. >> >> > > > >>> >> >> > > > >>> A general note regarding order and event time: I think >> relying >> >> on >> >> > > this >> >> > > > >>> for >> >> > > > >>> computation is very tricky unless the watermark is 100 % >> >> correct or >> >> > > you >> >> > > > >>> completely discard elements that arrive after the watermark, >> >> i.e. >> >> > > > >>> elements >> >> > > > >>> that would break the promise of the watermark that no >> elements >> >> with >> >> > > an >> >> > > > >>> earlier timestamp will ever arrive. The reason for this is >> that >> >> > there >> >> > > > >>> could >> >> > > > >>> always enter new elements that end up between already seen >> >> > elements. >> >> > > > For >> >> > > > >>> example, let's say we have this sequence of elements when the >> >> > trigger >> >> > > > >>> fires: >> >> > > > >>> >> >> > > > >>> a-b-a >> >> > > > >>> >> >> > > > >>> This is the sequence that you are looking for and you emit >> some >> >> > > result >> >> > > > >>> from >> >> > > > >>> the WindowFunction. Now, new elements arrive that fall in >> >> between >> >> > the >> >> > > > >>> elements we already have: >> >> > > > >>> >> >> > > > >>> a-d-e-b-f-g-a >> >> > > > >>> >> >> > > > >>> This is an updated, sorted view of the actual event-time >> stream >> >> and >> >> > > we >> >> > > > >>> didn't realize that the stream actually looks like this >> before. >> >> > Does >> >> > > > this >> >> > > > >>> still match the original pattern or should we now consider >> this >> >> as >> >> > > > >>> non-matching? If no, then the earlier successful match for >> a-b-a >> >> > was >> >> > > > >>> wrong >> >> > > > >>> and we should never have processed it but we didn't know at >> the >> >> > time. >> >> > > > If >> >> > > > >>> yes, then pattern matching like this can be done in the >> Trigger >> >> by >> >> > > > having >> >> > > > >>> something like pattern slots: You don't have to store all >> >> elements >> >> > in >> >> > > > the >> >> > > > >>> Trigger, you just need to store possible candidates that >> could >> >> > match >> >> > > > the >> >> > > > >>> pattern and ignore the other (in-between) elements. >> >> > > > >>> >> >> > > > >>> Cheers, >> >> > > > >>> Aljoscha >> >> > > > >>> >> >> > > > >>> On Fri, 8 Jul 2016 at 14:10 Vishnu Viswanath < >> >> > > > >>> vishnu.viswanat...@gmail.com> >> >> > > > >>> wrote: >> >> > > > >>> >> >> > > > >>> > Hi Aljoscha, >> >> > > > >>> > >> >> > > > >>> > That is a good idea, trying to tie it back to the use case, >> >> > > > >>> > e.g., suppose trigger is looking for a pattern, a-b-a and >> >> when it >> >> > > > sees >> >> > > > >>> such >> >> > > > >>> > a pattern, it will trigger the window and it knows that now >> >> the >> >> > > > >>> Evictor is >> >> > > > >>> > going to evict the element b, and trigger updates its >> state as >> >> > a-a >> >> > > > >>> (even >> >> > > > >>> > before the window & evictor completes) and will be looking >> for >> >> > the >> >> > > > >>> rest of >> >> > > > >>> > the pattern i.e., b-a. But I can think of 1 problem here, >> >> > > > >>> > >> >> > > > >>> > - the events can arrive out of order, i.e., the trigger >> >> might >> >> > be >> >> > > > >>> seeing >> >> > > > >>> > a pattern a-a-b but actual event time is a-b-a then >> trigger >> >> > will >> >> > > > >>> have to >> >> > > > >>> > sort the elements in the window everytime it sees an >> >> element. >> >> > (I >> >> > > > was >> >> > > > >>> > planning to do this sorting in the window, which will be >> >> less >> >> > > > often >> >> > > > >>> - >> >> > > > >>> > only >> >> > > > >>> > when the trigger fires) >> >> > > > >>> > >> >> > > > >>> > Thanks and Regards, >> >> > > > >>> > Vishnu Viswanath, >> >> > > > >>> > >> >> > > > >>> > On Fri, Jul 8, 2016 at 6:04 AM, Aljoscha Krettek < >> >> > > > aljos...@apache.org> >> >> > > > >>> > wrote: >> >> > > > >>> > >> >> > > > >>> > Hi, >> >> > > > >>> > > come to think of it, the right place to put such checks >> is >> >> > > actually >> >> > > > >>> the >> >> > > > >>> > > Trigger. It would have to be a custom trigger that >> observes >> >> > time >> >> > > > but >> >> > > > >>> also >> >> > > > >>> > > keeps some internal state machine to decide when it has >> >> > observed >> >> > > > the >> >> > > > >>> > right >> >> > > > >>> > > pattern in the window. Then the window function would >> just >> >> have >> >> > > to >> >> > > > >>> do the >> >> > > > >>> > > processing and you have good separation of concerns. Does >> >> that >> >> > > make >> >> > > > >>> > sense? >> >> > > > >>> > > >> >> > > > >>> > > I'm ignoring time and sorting by time for now because we >> >> > probably >> >> > > > >>> need >> >> > > > >>> > > another design document for that. To me it seems like a >> >> bigger >> >> > > > thing. >> >> > > > >>> > > >> >> > > > >>> > > Cheers, >> >> > > > >>> > > Aljoscha >> >> > > > >>> > > >> >> > > > >>> > > On Thu, 7 Jul 2016 at 23:56 Vishnu Viswanath < >> >> > > > >>> > vishnu.viswanat...@gmail.com >> >> > > > >>> > > > >> >> > > > >>> > > wrote: >> >> > > > >>> > > >> >> > > > >>> > > > Hi, >> >> > > > >>> > > > >> >> > > > >>> > > > Regarding the evictAfter function, that evicts based on >> >> some >> >> > > > >>> decision >> >> > > > >>> > > made >> >> > > > >>> > > > by the window function: I think it will be nice if we >> can >> >> > come >> >> > > > up >> >> > > > >>> with >> >> > > > >>> > > > something that is LESS coupled, because I can think of >> >> > several >> >> > > > use >> >> > > > >>> > cases >> >> > > > >>> > > > that depend on it. >> >> > > > >>> > > > >> >> > > > >>> > > > Especially in the case where there are late arriving >> >> > messages. >> >> > > > Only >> >> > > > >>> > after >> >> > > > >>> > > > the window function is applied we could tell what to do >> >> with >> >> > > the >> >> > > > >>> > elements >> >> > > > >>> > > > in the window. You could apply your business logic >> there >> >> to >> >> > > > >>> determine >> >> > > > >>> > if >> >> > > > >>> > > > the window funciton was able to do what it is supposed >> to >> >> do, >> >> > > if >> >> > > > >>> yes >> >> > > > >>> > > evict >> >> > > > >>> > > > those elements, else(since the elements you are looking >> >> for >> >> > > > haven't >> >> > > > >>> > > arrived >> >> > > > >>> > > > yet) wait and try again when the trigger gets fired >> next >> >> > time. >> >> > > > >>> > > > >> >> > > > >>> > > > Thanks and Regards, >> >> > > > >>> > > > Vishnu Viswanath, >> >> > > > >>> > > > >> >> > > > >>> > > > >> >> > > > >>> > > > On Thu, Jul 7, 2016 at 9:19 AM, Radu Tudoran < >> >> > > > >>> radu.tudo...@huawei.com> >> >> > > > >>> > > > wrote: >> >> > > > >>> > > > >> >> > > > >>> > > > > Hi, >> >> > > > >>> > > > > >> >> > > > >>> > > > > @Aljoscha - I can understand the reason why you are >> >> > hesitant >> >> > > to >> >> > > > >>> > > introduce >> >> > > > >>> > > > > "slower" windows such as the ones that would maintain >> >> > sorted >> >> > > > >>> items or >> >> > > > >>> > > > > windows with bindings between the different entities >> >> > > (evictor, >> >> > > > >>> > trigger, >> >> > > > >>> > > > > window, apply function). However, I think it's >> possible >> >> > just >> >> > > to >> >> > > > >>> > create >> >> > > > >>> > > > more >> >> > > > >>> > > > > types of windows. The existing ones (timewindows, >> global >> >> > > > windows >> >> > > > >>> ...) >> >> > > > >>> > > can >> >> > > > >>> > > > > remain, and just add some more flavors of windows >> were >> >> more >> >> > > > >>> features >> >> > > > >>> > > are >> >> > > > >>> > > > > enabled or more functionality (e.g., access to the >> each >> >> > > element >> >> > > > >>> in >> >> > > > >>> > the >> >> > > > >>> > > > > evictor ; possibility to delete or mark for eviction >> >> > elements >> >> > > > in >> >> > > > >>> the >> >> > > > >>> > > > > function...) >> >> > > > >>> > > > > >> >> > > > >>> > > > > Regarding the specific case of sorted windows, I >> think >> >> the >> >> > N >> >> > > > lon >> >> > > > >>> N >> >> > > > >>> > > > > complexity to sort (the worst case) is very >> unlikely. In >> >> > fact >> >> > > > you >> >> > > > >>> > have >> >> > > > >>> > > > > almost sorted items/arrays. Moreover, if you consider >> >> that >> >> > in >> >> > > > >>> > > iteration X >> >> > > > >>> > > > > all elements were sorted, then in iteration X+1 you >> will >> >> > need >> >> > > > to >> >> > > > >>> sort >> >> > > > >>> > > > just >> >> > > > >>> > > > > the newly arrived elements (M). I would expect that >> this >> >> > > > number M >> >> > > > >>> > might >> >> > > > >>> > > > be >> >> > > > >>> > > > > significant smaller then N (elements that exists). >> Then >> >> > using >> >> > > > an >> >> > > > >>> > > > insertion >> >> > > > >>> > > > > sort for these new elements you would have M * N >> >> > complexity >> >> > > > and >> >> > > > >>> if >> >> > > > >>> > > M<< N >> >> > > > >>> > > > > then the complexity is O(N). Alternatively you can >> use a >> >> > > binary >> >> > > > >>> > search >> >> > > > >>> > > > for >> >> > > > >>> > > > > insertion and then you further reduce the complexity >> to >> >> > > > O(logN). >> >> > > > >>> > > > > If M is proportional to N then you can sort M and use >> >> merge >> >> > > > sort >> >> > > > >>> for >> >> > > > >>> > > > > combining. >> >> > > > >>> > > > > >> >> > > > >>> > > > > >> >> > > > >>> > > > > Dr. Radu Tudoran >> >> > > > >>> > > > > Research Engineer - Big Data Expert >> >> > > > >>> > > > > IT R&D Division >> >> > > > >>> > > > > >> >> > > > >>> > > > > >> >> > > > >>> > > > > HUAWEI TECHNOLOGIES Duesseldorf GmbH >> >> > > > >>> > > > > European Research Center >> >> > > > >>> > > > > Riesstrasse 25, 80992 München >> >> > > > >>> > > > > >> >> > > > >>> > > > > E-mail: radu.tudo...@huawei.com >> >> > > > >>> > > > > Mobile: +49 15209084330 >> >> > > > >>> > > > > Telephone: +49 891588344173 >> >> > > > >>> > > > > >> >> > > > >>> > > > > HUAWEI TECHNOLOGIES Duesseldorf GmbH >> >> > > > >>> > > > > Hansaallee 205, 40549 Düsseldorf, Germany, >> >> www.huawei.com >> >> > > > >>> > > > > Registered Office: Düsseldorf, Register Court >> >> Düsseldorf, >> >> > HRB >> >> > > > >>> 56063, >> >> > > > >>> > > > > Managing Director: Bo PENG, Wanzhou MENG, Lifang CHEN >> >> > > > >>> > > > > Sitz der Gesellschaft: Düsseldorf, Amtsgericht >> >> Düsseldorf, >> >> > > HRB >> >> > > > >>> 56063, >> >> > > > >>> > > > > Geschäftsführer: Bo PENG, Wanzhou MENG, Lifang CHEN >> >> > > > >>> > > > > This e-mail and its attachments contain confidential >> >> > > > information >> >> > > > >>> from >> >> > > > >>> > > > > HUAWEI, which is intended only for the person or >> entity >> >> > whose >> >> > > > >>> address >> >> > > > >>> > > is >> >> > > > >>> > > > > listed above. Any use of the information contained >> >> herein >> >> > in >> >> > > > any >> >> > > > >>> way >> >> > > > >>> > > > > (including, but not limited to, total or partial >> >> > disclosure, >> >> > > > >>> > > > reproduction, >> >> > > > >>> > > > > or dissemination) by persons other than the intended >> >> > > > >>> recipient(s) is >> >> > > > >>> > > > > prohibited. If you receive this e-mail in error, >> please >> >> > > notify >> >> > > > >>> the >> >> > > > >>> > > sender >> >> > > > >>> > > > > by phone or email immediately and delete it! >> >> > > > >>> > > > > >> >> > > > >>> > > > > >> >> > > > >>> > > > > -----Original Message----- >> >> > > > >>> > > > > From: 吕文龙(吕文龙) [mailto:wenlong....@alibaba-inc.com] >> >> > > > >>> > > > > Sent: Thursday, July 07, 2016 11:59 AM >> >> > > > >>> > > > > To: dev@flink.apache.org >> >> > > > >>> > > > > Subject: 答复: [DISCUSS] Enhance Window Evictor in >> Flink >> >> > > > >>> > > > > >> >> > > > >>> > > > > HI, >> >> > > > >>> > > > > I think it is necessary to support sorted window, >> which >> >> can >> >> > > > avoid >> >> > > > >>> > > > scanning >> >> > > > >>> > > > > all the elements of window while trying to evicting >> >> > element, >> >> > > > >>> which >> >> > > > >>> > may >> >> > > > >>> > > > cost >> >> > > > >>> > > > > many IO operations, such as querying DBs to get >> elements >> >> > from >> >> > > > >>> state. >> >> > > > >>> > > > > What's more, when an window aggregation function is >> >> > > invertible, >> >> > > > >>> such >> >> > > > >>> > as >> >> > > > >>> > > > > sum, which can be updated by adding or removing a >> single >> >> > > > record, >> >> > > > >>> > window >> >> > > > >>> > > > > results can be incrementally calculated. In this >> kind of >> >> > > case, >> >> > > > >>> we can >> >> > > > >>> > > > > dramatically improve the performance of window >> >> aggregation, >> >> > > if >> >> > > > >>> > evictor >> >> > > > >>> > > > can >> >> > > > >>> > > > > trigger update of window aggregation state by some >> >> > mechanism. >> >> > > > >>> > > > > >> >> > > > >>> > > > > Best Wishes! >> >> > > > >>> > > > > --- >> >> > > > >>> > > > > wenlong.lwl >> >> > > > >>> > > > > >> >> > > > >>> > > > > -----邮件原件----- >> >> > > > >>> > > > > 发件人: Aljoscha Krettek [mailto:aljos...@apache.org] >> >> > > > >>> > > > > 发送时间: 2016年7月7日 17:32 >> >> > > > >>> > > > > 收件人: dev@flink.apache.org >> >> > > > >>> > > > > 主题: Re: [DISCUSS] Enhance Window Evictor in Flink >> >> > > > >>> > > > > >> >> > > > >>> > > > > Hi, >> >> > > > >>> > > > > regarding "sorting the window by event time": I also >> >> > > considered >> >> > > > >>> this >> >> > > > >>> > > but >> >> > > > >>> > > > > in the end I don't think it's necessary. Sorting is >> >> rather >> >> > > > >>> expensive >> >> > > > >>> > > and >> >> > > > >>> > > > > making decisions based on the order of elements can >> be >> >> > > tricky. >> >> > > > An >> >> > > > >>> > > extreme >> >> > > > >>> > > > > example of why this can be problematic is the case >> where >> >> > all >> >> > > > >>> elements >> >> > > > >>> > > in >> >> > > > >>> > > > > the window have the same timestamp. Now, if you >> decide >> >> to >> >> > > evict >> >> > > > >>> the >> >> > > > >>> > > > first 5 >> >> > > > >>> > > > > elements based on timestamp order you basically >> >> arbitrarily >> >> > > > >>> evict 5 >> >> > > > >>> > > > > elements. I think the better solution for doing >> >> time-based >> >> > > > >>> eviction >> >> > > > >>> > is >> >> > > > >>> > > to >> >> > > > >>> > > > > do one pass over the elements to get an overview of >> the >> >> > > > timestamp >> >> > > > >>> > > > > distribution, then do a second pass and evict >> elements >> >> > based >> >> > > on >> >> > > > >>> what >> >> > > > >>> > > was >> >> > > > >>> > > > > learned in the first pass. This has complexity 2*n >> >> compared >> >> > > to >> >> > > > >>> the >> >> > > > >>> > > n*log >> >> > > > >>> > > > n >> >> > > > >>> > > > > (plus the work of actually deciding what to evict) of >> >> the >> >> > > sort >> >> > > > >>> based >> >> > > > >>> > > > > strategy. >> >> > > > >>> > > > > >> >> > > > >>> > > > > I might be wrong, though, and there could be a valid >> >> > use-case >> >> > > > not >> >> > > > >>> > > covered >> >> > > > >>> > > > > by the above idea. >> >> > > > >>> > > > > >> >> > > > >>> > > > > regarding Vishnu's other use case of evicting based >> on >> >> some >> >> > > > >>> decision >> >> > > > >>> > in >> >> > > > >>> > > > the >> >> > > > >>> > > > > WindowFunction: could this be solved by doing the >> check >> >> for >> >> > > the >> >> > > > >>> > pattern >> >> > > > >>> > > > in >> >> > > > >>> > > > > the evictor itself instead of in the window function? >> >> I'm >> >> > > very >> >> > > > >>> > hesitant >> >> > > > >>> > > > to >> >> > > > >>> > > > > introduce a coupling between the different >> components of >> >> > the >> >> > > > >>> > windowing >> >> > > > >>> > > > > system, i.e. assigner, trigger, evictor and window >> >> > function. >> >> > > > The >> >> > > > >>> > reason >> >> > > > >>> > > > is >> >> > > > >>> > > > > that using an evictor has a huge performance impact >> >> since >> >> > the >> >> > > > >>> system >> >> > > > >>> > > > always >> >> > > > >>> > > > > has to keep all elements and cannot to incremental >> >> > > aggregation >> >> > > > of >> >> > > > >>> > > window >> >> > > > >>> > > > > results and I therefore don't want to put specific >> >> features >> >> > > > >>> regarding >> >> > > > >>> > > > > eviction into the other components. >> >> > > > >>> > > > > >> >> > > > >>> > > > > Cheers, >> >> > > > >>> > > > > Aljoscha >> >> > > > >>> > > > > >> >> > > > >>> > > > > On Thu, 7 Jul 2016 at 10:00 Radu Tudoran < >> >> > > > >>> radu.tudo...@huawei.com> >> >> > > > >>> > > > wrote: >> >> > > > >>> > > > > >> >> > > > >>> > > > > > Hi, >> >> > > > >>> > > > > > >> >> > > > >>> > > > > > I think the situation Vishnu raised is something >> that >> >> > > should >> >> > > > be >> >> > > > >>> > > > > accounted. >> >> > > > >>> > > > > > It can happen indeed that you want to condition >> what >> >> you >> >> > > > evict >> >> > > > >>> from >> >> > > > >>> > > > > > the window based on the result of the function to >> be >> >> > > applied. >> >> > > > >>> > > > > > >> >> > > > >>> > > > > > My 2 cents... >> >> > > > >>> > > > > > I would suggest adding a list for the elements of >> the >> >> > > stream >> >> > > > >>> where >> >> > > > >>> > > you >> >> > > > >>> > > > > > can MARK them to be delete. Alternatively the >> iterator >> >> > can >> >> > > be >> >> > > > >>> > > extended >> >> > > > >>> > > > > > to have a function Iterator.markForEviction(int); >> >> These >> >> > can >> >> > > > be >> >> > > > >>> made >> >> > > > >>> > > > > > available also in the apply function. Moreover, we >> can >> >> > use >> >> > > > >>> this to >> >> > > > >>> > > > > > extend the functionality such that you add MARKs >> >> either >> >> > for >> >> > > > >>> > eviction >> >> > > > >>> > > > > > after the function has finished triggering or to be >> >> > evicted >> >> > > > in >> >> > > > >>> the >> >> > > > >>> > > next >> >> > > > >>> > > > > iteration. >> >> > > > >>> > > > > > >> >> > > > >>> > > > > > >> >> > > > >>> > > > > > Dr. Radu Tudoran >> >> > > > >>> > > > > > Research Engineer - Big Data Expert >> >> > > > >>> > > > > > IT R&D Division >> >> > > > >>> > > > > > >> >> > > > >>> > > > > > >> >> > > > >>> > > > > > HUAWEI TECHNOLOGIES Duesseldorf GmbH >> >> > > > >>> > > > > > European Research Center >> >> > > > >>> > > > > > Riesstrasse 25, 80992 München >> >> > > > >>> > > > > > >> >> > > > >>> > > > > > E-mail: radu.tudo...@huawei.com >> >> > > > >>> > > > > > Mobile: +49 15209084330 >> >> > > > >>> > > > > > Telephone: +49 891588344173 >> >> > > > >>> > > > > > >> >> > > > >>> > > > > > HUAWEI TECHNOLOGIES Duesseldorf GmbH >> >> > > > >>> > > > > > Hansaallee 205, 40549 Düsseldorf, Germany, >> >> > www.huawei.com >> >> > > > >>> > Registered >> >> > > > >>> > > > > > Office: Düsseldorf, Register Court Düsseldorf, HRB >> >> 56063, >> >> > > > >>> Managing >> >> > > > >>> > > > > > Director: Bo PENG, Wanzhou MENG, Lifang CHEN Sitz >> der >> >> > > > >>> Gesellschaft: >> >> > > > >>> > > > > > Düsseldorf, Amtsgericht Düsseldorf, HRB 56063, >> >> > > > >>> > > > > > Geschäftsführer: Bo PENG, Wanzhou MENG, Lifang CHEN >> >> This >> >> > > > >>> e-mail and >> >> > > > >>> > > > > > its attachments contain confidential information >> from >> >> > > HUAWEI, >> >> > > > >>> which >> >> > > > >>> > > is >> >> > > > >>> > > > > > intended only for the person or entity whose >> address >> >> is >> >> > > > listed >> >> > > > >>> > above. >> >> > > > >>> > > > > > Any use of the information contained herein in any >> way >> >> > > > >>> (including, >> >> > > > >>> > > but >> >> > > > >>> > > > > > not limited to, total or partial disclosure, >> >> > reproduction, >> >> > > or >> >> > > > >>> > > > > > dissemination) by persons other than the intended >> >> > > > recipient(s) >> >> > > > >>> is >> >> > > > >>> > > > > > prohibited. If you receive this e-mail in error, >> >> please >> >> > > > notify >> >> > > > >>> the >> >> > > > >>> > > > > > sender by phone or email immediately and delete it! >> >> > > > >>> > > > > > >> >> > > > >>> > > > > > -----Original Message----- >> >> > > > >>> > > > > > From: Vishnu Viswanath [mailto: >> >> > > vishnu.viswanat...@gmail.com] >> >> > > > >>> > > > > > Sent: Thursday, July 07, 2016 1:28 AM >> >> > > > >>> > > > > > To: Dev >> >> > > > >>> > > > > > Subject: Re: [DISCUSS] Enhance Window Evictor in >> Flink >> >> > > > >>> > > > > > >> >> > > > >>> > > > > > Thank you Maxim and Aljoscha. >> >> > > > >>> > > > > > >> >> > > > >>> > > > > > Yes the beforeEvict and afterEvict should able >> address >> >> > > point >> >> > > > 3. >> >> > > > >>> > > > > > >> >> > > > >>> > > > > > I have one more use case in my mind (which I might >> >> have >> >> > to >> >> > > do >> >> > > > >>> in >> >> > > > >>> > the >> >> > > > >>> > > > > > later stages of POC). >> >> > > > >>> > > > > > What if the `evictAfter` should behave differently >> >> based >> >> > on >> >> > > > the >> >> > > > >>> > > window >> >> > > > >>> > > > > > function. >> >> > > > >>> > > > > > >> >> > > > >>> > > > > > For example. >> >> > > > >>> > > > > > I have a window that got triggered and my evict >> >> function >> >> > is >> >> > > > >>> being >> >> > > > >>> > > > > > called after the apply function. In such cases I >> >> should >> >> > be >> >> > > > >>> able to >> >> > > > >>> > > > > > decide on what I should evict based on the window >> >> > function. >> >> > > > >>> > > > > > e.g., >> >> > > > >>> > > > > > let the window have elements of type `case class >> >> Item(id: >> >> > > > >>> String, >> >> > > > >>> > > type: >> >> > > > >>> > > > > > String)` and let the types be `type1` and `type2`. >> >> > > > >>> > > > > > If window function is able to find a sequence : >> `type1 >> >> > > type2 >> >> > > > >>> > type1`, >> >> > > > >>> > > > > > then evict all elements of the type type2. >> >> > > > >>> > > > > > or if the window function is able to find a >> sequence >> >> > `type2 >> >> > > > >>> type2 >> >> > > > >>> > > > > > type1`, then evict all elements of type type1 else >> >> don't >> >> > > > evict >> >> > > > >>> any >> >> > > > >>> > > > > elements. >> >> > > > >>> > > > > > >> >> > > > >>> > > > > > Is this possible? or at least let the window >> function >> >> > > choose >> >> > > > >>> > between >> >> > > > >>> > > > > > two Evictor functions -(one for success case and >> one >> >> > > failure >> >> > > > >>> case) >> >> > > > >>> > > > > > >> >> > > > >>> > > > > > @Maxim: >> >> > > > >>> > > > > > regarding the sorted window, actually I wanted my >> >> > elements >> >> > > to >> >> > > > >>> be >> >> > > > >>> > > > > > sorted but not for the eviction but while applying >> the >> >> > > window >> >> > > > >>> > > function >> >> > > > >>> > > > > > (so thought this could be done easily). But it >> would >> >> be >> >> > > good >> >> > > > to >> >> > > > >>> > have >> >> > > > >>> > > > > > the window sorted based on EventTime. >> >> > > > >>> > > > > > >> >> > > > >>> > > > > > >> >> > > > >>> > > > > > Thanks and Regards, >> >> > > > >>> > > > > > Vishnu Viswanath, >> >> > > > >>> > > > > > >> >> > > > >>> > > > > > >> >> > > > >>> > > > > > >> >> > > > >>> > > > > > >> >> > > > >>> > > > > > On Wed, Jul 6, 2016 at 3:55 PM, Maxim < >> >> mfat...@gmail.com >> >> > > >> >> > > > >>> wrote: >> >> > > > >>> > > > > > >> >> > > > >>> > > > > > > Actually for such evictor to be useful the window >> >> > should >> >> > > be >> >> > > > >>> > sorted >> >> > > > >>> > > > > > > by some field, usually event time. What do you >> think >> >> > > about >> >> > > > >>> adding >> >> > > > >>> > > > > > > sorted window abstraction? >> >> > > > >>> > > > > > > >> >> > > > >>> > > > > > > On Wed, Jul 6, 2016 at 11:36 AM, Aljoscha Krettek >> >> > > > >>> > > > > > > <aljos...@apache.org> >> >> > > > >>> > > > > > > wrote: >> >> > > > >>> > > > > > > >> >> > > > >>> > > > > > > > @Maxim: That's perfect I didn't think about >> using >> >> > > > >>> > > > > > > > Iterator.remove() for that. I'll update the >> doc. >> >> What >> >> > > do >> >> > > > >>> you >> >> > > > >>> > > think >> >> > > > >>> > > > > > > > Vishnu? This should also >> >> > > > >>> > > > > > > cover >> >> > > > >>> > > > > > > > your before/after case nicely. >> >> > > > >>> > > > > > > > >> >> > > > >>> > > > > > > > @Vishnu: The steps would be these: >> >> > > > >>> > > > > > > > - Converge on a design in this discussion >> >> > > > >>> > > > > > > > - Add a Jira issue here: >> >> > > > >>> > > > > > > > https://issues.apache.org/jira/browse/FLINK >> >> > > > >>> > > > > > > > - Work on the code an create a pull request on >> >> > github >> >> > > > >>> > > > > > > > >> >> > > > >>> > > > > > > > The steps are also outlined here >> >> > > > >>> > > > > > > > http://flink.apache.org/how-to-contribute.html >> >> and >> >> > > here >> >> > > > >>> > > > > > > > http://flink.apache.org/contribute-code.html. >> >> > > > >>> > > > > > > > >> >> > > > >>> > > > > > > > - >> >> > > > >>> > > > > > > > Aljoscha >> >> > > > >>> > > > > > > > >> >> > > > >>> > > > > > > > On Wed, 6 Jul 2016 at 19:45 Maxim < >> >> mfat...@gmail.com >> >> > > >> >> > > > >>> wrote: >> >> > > > >>> > > > > > > > >> >> > > > >>> > > > > > > > > The new API forces iteration through every >> >> element >> >> > of >> >> > > > the >> >> > > > >>> > > buffer >> >> > > > >>> > > > > > > > > even >> >> > > > >>> > > > > > > if >> >> > > > >>> > > > > > > > a >> >> > > > >>> > > > > > > > > single value to be evicted. What about >> >> implementing >> >> > > > >>> > > > > > > > > Iterator.remove() method for elements? The >> API >> >> > would >> >> > > > look >> >> > > > >>> > like: >> >> > > > >>> > > > > > > > > >> >> > > > >>> > > > > > > > > public interface Evictor<T, W extends Window> >> >> > extends >> >> > > > >>> > > > > > > > > Serializable { >> >> > > > >>> > > > > > > > > >> >> > > > >>> > > > > > > > > /** >> >> > > > >>> > > > > > > > > * Optionally evicts elements. Called >> before >> >> > > > >>> windowing >> >> > > > >>> > > > > function. >> >> > > > >>> > > > > > > > > * >> >> > > > >>> > > > > > > > > * @param elements The elements currently >> in >> >> the >> >> > > > >>> pane. Use >> >> > > > >>> > > > > > > > > Iterator.remove to evict. >> >> > > > >>> > > > > > > > > * @param size The current number of >> >> elements in >> >> > > the >> >> > > > >>> pane. >> >> > > > >>> > > > > > > > > * @param window The {@link Window} >> >> > > > >>> > > > > > > > > */ >> >> > > > >>> > > > > > > > > void evictBefore(Iterable<T> elements, int >> >> size, >> >> > > > >>> > > > > > > > > EvictorContext >> >> > > > >>> > > > > > > ctx); >> >> > > > >>> > > > > > > > > >> >> > > > >>> > > > > > > > > /** >> >> > > > >>> > > > > > > > > * Optionally evicts elements. Called >> after >> >> > > > windowing >> >> > > > >>> > > > function. >> >> > > > >>> > > > > > > > > * >> >> > > > >>> > > > > > > > > * @param elements The elements currently >> in >> >> the >> >> > > > >>> pane. Use >> >> > > > >>> > > > > > > > > Iterator.remove to evict. >> >> > > > >>> > > > > > > > > * @param size The current number of >> >> elements in >> >> > > the >> >> > > > >>> pane. >> >> > > > >>> > > > > > > > > * @param window The {@link Window} >> >> > > > >>> > > > > > > > > */ >> >> > > > >>> > > > > > > > > void evictAfter(Iterable<T> elements, int >> >> size, >> >> > > > >>> > > > > > > > > EvictorContext ctx); } >> >> > > > >>> > > > > > > > > >> >> > > > >>> > > > > > > > > Such API allows to abort iteration at any >> point >> >> and >> >> > > > evict >> >> > > > >>> > > > > > > > > elements in >> >> > > > >>> > > > > > > any >> >> > > > >>> > > > > > > > > order. >> >> > > > >>> > > > > > > > > >> >> > > > >>> > > > > > > > > Thanks, >> >> > > > >>> > > > > > > > > >> >> > > > >>> > > > > > > > > Maxim. >> >> > > > >>> > > > > > > > > >> >> > > > >>> > > > > > > > > On Wed, Jul 6, 2016 at 9:04 AM, Vishnu >> >> Viswanath < >> >> > > > >>> > > > > > > > > vishnu.viswanat...@gmail.com> wrote: >> >> > > > >>> > > > > > > > > > >> >> > > > >>> > > > > > > > > > Hi Aljoscha, >> >> > > > >>> > > > > > > > > > >> >> > > > >>> > > > > > > > > > Thanks. Yes the new interface seems to >> address >> >> > > points >> >> > > > >>> 1 and >> >> > > > >>> > > 2. >> >> > > > >>> > > > > > > > > > of >> >> > > > >>> > > > > > > > > > >> >> > > > >>> > > > > > > > > > *1) I am having a use case where I have to >> >> > create a >> >> > > > >>> custom >> >> > > > >>> > > > > > > > > > Evictor >> >> > > > >>> > > > > > > that >> >> > > > >>> > > > > > > > > > will evict elements from the window based >> on >> >> the >> >> > > > value >> >> > > > >>> > (e.g., >> >> > > > >>> > > > > > > > > > if I >> >> > > > >>> > > > > > > have >> >> > > > >>> > > > > > > > > > elements are of case class Item(id: Int, >> >> > > type:String) >> >> > > > >>> then >> >> > > > >>> > > > > > > > > > evict >> >> > > > >>> > > > > > > > elements >> >> > > > >>> > > > > > > > > > that has type="a"). I believe this is not >> >> > currently >> >> > > > >>> > > possible.* >> >> > > > >>> > > > > > > > > > *2) this is somewhat related to 1) where >> there >> >> > > should >> >> > > > >>> be an >> >> > > > >>> > > > > > > > > > option to >> >> > > > >>> > > > > > > > > evict >> >> > > > >>> > > > > > > > > > elements from anywhere in the window. not >> only >> >> > from >> >> > > > the >> >> > > > >>> > > > > > > > > > beginning of >> >> > > > >>> > > > > > > > the >> >> > > > >>> > > > > > > > > > window. (e.g., apply the delta function to >> all >> >> > > > >>> elements and >> >> > > > >>> > > > > > > > > > remove >> >> > > > >>> > > > > > > all >> >> > > > >>> > > > > > > > > > those don't pass. I checked the code and >> evict >> >> > > method >> >> > > > >>> just >> >> > > > >>> > > > > > > > > > returns >> >> > > > >>> > > > > > > the >> >> > > > >>> > > > > > > > > > number of elements to be removed and >> >> > > > >>> processTriggerResult >> >> > > > >>> > > just >> >> > > > >>> > > > > > > > > > skips >> >> > > > >>> > > > > > > > > those >> >> > > > >>> > > > > > > > > > many elements from the beginning. * >> >> > > > >>> > > > > > > > > > *3) Add an option to enables the user to >> >> decide >> >> > if >> >> > > > the >> >> > > > >>> > > > > > > > > > eviction >> >> > > > >>> > > > > > > should >> >> > > > >>> > > > > > > > > > happen before the apply function or after >> the >> >> > apply >> >> > > > >>> > function. >> >> > > > >>> > > > > > > Currently >> >> > > > >>> > > > > > > > > it >> >> > > > >>> > > > > > > > > > is before the apply function, but I have a >> use >> >> > case >> >> > > > >>> where I >> >> > > > >>> > > > > > > > > > need to >> >> > > > >>> > > > > > > > first >> >> > > > >>> > > > > > > > > > apply the function and evict afterward.* >> >> > > > >>> > > > > > > > > > >> >> > > > >>> > > > > > > > > > I would be interested in contributing to >> the >> >> code >> >> > > > base. >> >> > > > >>> > > Please >> >> > > > >>> > > > > > > > > > let me >> >> > > > >>> > > > > > > > > know >> >> > > > >>> > > > > > > > > > the steps. >> >> > > > >>> > > > > > > > > > >> >> > > > >>> > > > > > > > > > Thanks and Regards, >> >> > > > >>> > > > > > > > > > Vishnu Viswanath >> >> > > > >>> > > > > > > > > > >> >> > > > >>> > > > > > > > > > On Wed, Jul 6, 2016 at 11:49 AM, Aljoscha >> >> > Krettek < >> >> > > > >>> > > > > > > aljos...@apache.org >> >> > > > >>> > > > > > > > > >> >> > > > >>> > > > > > > > > > wrote: >> >> > > > >>> > > > > > > > > > >> >> > > > >>> > > > > > > > > > > Hi, >> >> > > > >>> > > > > > > > > > > as mentioned in the thread on improving >> the >> >> > > > Windowing >> >> > > > >>> > API I >> >> > > > >>> > > > > > > > > > > also >> >> > > > >>> > > > > > > > have a >> >> > > > >>> > > > > > > > > > > design doc just for improving >> >> WindowEvictors. I >> >> > > had >> >> > > > >>> this >> >> > > > >>> > in >> >> > > > >>> > > > > > > > > > > my head >> >> > > > >>> > > > > > > > for >> >> > > > >>> > > > > > > > > a >> >> > > > >>> > > > > > > > > > > while but was hesitant to publish but >> since >> >> > > people >> >> > > > >>> are >> >> > > > >>> > > > > > > > > > > asking about >> >> > > > >>> > > > > > > > > this >> >> > > > >>> > > > > > > > > > > now might be a good time to post it. >> Here's >> >> the >> >> > > > doc: >> >> > > > >>> > > > > > > > > > > >> >> > > > >>> > > > > > > > > >> >> > > > >>> > > > > > > > > >> >> > > > >>> > > > > > > > >> >> > > > >>> > > > > > > >> >> > > > >>> > > >> >> > > > >> >> https://docs.google.com/document/d/1Rr7xzlItYqvFXLyyy-Yv0vvw8f29QYAj >> >> > > > >>> > > > > > > m5 >> >> > > > >>> > > > > > > i9E4A_JlU/edit?usp=sharing >> >> > > > >>> > > > > > > > > > > >> >> > > > >>> > > > > > > > > > > Feedback/Suggestions are very welcome! >> >> Please >> >> > let >> >> > > > me >> >> > > > >>> know >> >> > > > >>> > > > > > > > > > > what you >> >> > > > >>> > > > > > > > > think. >> >> > > > >>> > > > > > > > > > > >> >> > > > >>> > > > > > > > > > > @Vishnu: Are you interested in >> contributing >> >> a >> >> > > > >>> solution >> >> > > > >>> > for >> >> > > > >>> > > > > > > > > > > this to >> >> > > > >>> > > > > > > > the >> >> > > > >>> > > > > > > > > > > Flink code base? I'd be very happy to >> work >> >> with >> >> > > you >> >> > > > >>> on >> >> > > > >>> > > this. >> >> > > > >>> > > > > > > > > > > >> >> > > > >>> > > > > > > > > > > Cheers, >> >> > > > >>> > > > > > > > > > > Aljoscha >> >> > > > >>> > > > > > > > > > > >> >> > > > >>> > > > > > > > > > > P.S. I think it would be best to keep >> >> > discussions >> >> > > > to >> >> > > > >>> the >> >> > > > >>> > ML >> >> > > > >>> > > > > > > > > > > because comments on the doc will not be >> >> visible >> >> > > > here >> >> > > > >>> for >> >> > > > >>> > > > > > everyone. >> >> > > > >>> > > > > > > > > > > >> >> > > > >>> > > > > > > > > >> >> > > > >>> > > > > > > > >> >> > > > >>> > > > > > > >> >> > > > >>> > > > > > >> >> > > > >>> > > > > >> >> > > > >>> > > > >> >> > > > >>> > > >> >> > > > >>> > >> >> > > > >>> > >> >> > > > >>> >> >> > > > >> >> >> > > > >> >> >> > > > >> >> >> > > > >> >> >> > > > >> >> >> > > > > >> >> > > > > >> >> > > > >> >> > > >> >> > >> >> >> > >> > >> >