I think we reached consensus here so I would like to mark this FLIP as accepted. We will now process with implementing the first step, i.e. adding the new ProcessWindowFunction.
On Mon, 1 Aug 2016 at 18:08 Aljoscha Krettek <aljos...@apache.org> wrote: > Alright, that seems reasonable. I updated the doc to add the Collector to > the method signature again. > > On Mon, 1 Aug 2016 at 00:59 Stephan Ewen <se...@apache.org> wrote: > > The Collector is pretty integral in all the other functions that return > multiple elements. I honestly don't see us switching away from it, given > that it is such a core part of the API. > > The close() method has, to my best knowledge, not caused issues, yet. I > cannot recall anyone mentioning that the close() method confused them, they > accidentally called it, etc. > I am wondering whether this is more of a theoretical than practical issue. > > If we move one function away from Collector to be on the safe side for a > "maybe change in the future", while keeping Collector in all other > functions - I think that fragments the API concept wise more than it > improves anything. > > On Sun, Jul 31, 2016 at 7:10 PM, Aljoscha Krettek <aljos...@apache.org> > wrote: > > > @Stephan: For the Output, should we keep using a Collector (which > exposes) > > the close() method which should never be called by users or create a new > > Output type that only has an "output" method. Collector can also be used > > but with a close() method that doesn't do anything. In the long run, I > > thought it might be better to switch the type away from Collector. > > > > Cheers, > > Aljoscha > > > > On Wed, 20 Jul 2016 at 01:25 Maximilian Michels <m...@apache.org> wrote: > > > > > I think it looks like Beam rather than Hadoop :) > > > > > > What Stephan meant was that he wanted a dedicated output method in the > > > ProcessWindowFunction. I agree with Aljoscha that we shouldn't expose > > > the collector. > > > > > > On Tue, Jul 19, 2016 at 10:45 PM, Aljoscha Krettek < > aljos...@apache.org> > > > wrote: > > > > You mean keep the Collector? I don't like that one because it has the > > > > close() method that should never be called by the user. > > > > > > > > We can keep it, though, because all the other user function > interfaces > > > also > > > > expose it to the user. > > > > > > > > On Tue, 19 Jul 2016 at 15:22 Stephan Ewen <se...@apache.org> wrote: > > > > > > > >> I would actually make the output a separate parameter as well. > Pretty > > > much > > > >> like the old variant, only replacing the "Window" parameter by the > > > context > > > >> (which contains everything about the window). > > > >> It could also be called "WindowInvocationContext" or so. > > > >> > > > >> The current variant looks too Hadoop to me ;-) Everything done on > the > > > >> context object, and messy mocking when creating tests. > > > >> > > > >> On Mon, Jul 18, 2016 at 6:42 PM, Radu Tudoran < > > radu.tudo...@huawei.com> > > > >> wrote: > > > >> > > > >> > Hi, > > > >> > > > > >> > Sorry - I made a mistake - I was thinking of getting access to the > > > >> > collection (mist-read :) collector) of events in the window buffer > > in > > > >> > order to be able to delete/evict some of them which are not > > necessary > > > the > > > >> > last ones. > > > >> > > > > >> > > > > >> > Radu > > > >> > > > > >> > -----Original Message----- > > > >> > From: Aljoscha Krettek [mailto:aljos...@apache.org] > > > >> > Sent: Monday, July 18, 2016 5:54 PM > > > >> > To: dev@flink.apache.org > > > >> > Subject: Re: [DISCUSS] FLIP-2 Extending Window Function Metadata > > > >> > > > > >> > What about the collector? This is only used for emitting elements > to > > > the > > > >> > downstream operation. > > > >> > > > > >> > On Mon, 18 Jul 2016 at 17:52 Radu Tudoran < > radu.tudo...@huawei.com> > > > >> wrote: > > > >> > > > > >> > > Hi, > > > >> > > > > > >> > > I think it looks good and most importantly is that we can extend > > it > > > in > > > >> > > the directions discussed so far. > > > >> > > > > > >> > > One question though regarding the Collector - are we going to be > > > able > > > >> > > to delete random elements from the list if this is not exposed > as > > a > > > >> > > collection, at least to the evictor? If not, how are we going to > > > >> > > extend in the future to cover this case? > > > >> > > > > > >> > > Regarding the ordering - I also observed that there are > situations > > > >> > > where elements do not have a logical order. One example is if > you > > > have > > > >> > > high rates of the events. Nevertheless, even if now is not the > > time > > > >> > > for this, I think in the future we can imagine having also some > > data > > > >> > > structures that offer some ordering. It can save some > computation > > > >> > > efforts later in the functions for some use cases. > > > >> > > > > > >> > > > > > >> > > Best regards, > > > >> > > > > > >> > > > > > >> > > -----Original Message----- > > > >> > > From: Aljoscha Krettek [mailto:aljos...@apache.org] > > > >> > > Sent: Monday, July 18, 2016 3:45 PM > > > >> > > To: dev@flink.apache.org > > > >> > > Subject: Re: [DISCUSS] FLIP-2 Extending Window Function Metadata > > > >> > > > > > >> > > I incorporated the changes. The proposed interface of > > > >> > > ProcessWindowFunction is now this: > > > >> > > > > > >> > > public abstract class ProcessWindowFunction <IN, OUT, KEY, W > > extends > > > >> > > Window> implements Function { > > > >> > > > > > >> > > public abstract void process(KEY key, Iterable<IN> elements, > > > >> > > Context > > > >> > > ctx) throws Exception; > > > >> > > > > > >> > > public abstract class Context { > > > >> > > public abstract W window(); > > > >> > > public abstract void output(OUT value); > > > >> > > } > > > >> > > } > > > >> > > > > > >> > > I'm proposing to not expose Collector anymore because it has the > > > >> > > close() method that should not be called by users. Having the > > > output() > > > >> > > call directly on the context should work just as well. > > > >> > > > > > >> > > Also, I marked the "adding a firing reason" and "adding firing > > > >> > > counter" as future work that are only examples of stuff that can > > be > > > >> > > implemented on top of the new interface. Initially, this will > > > provide > > > >> > > exactly the same features as the old API but be extensible. I > did > > > this > > > >> > > to not make the scope of this proposal to big because Radu also > > > >> > > suggested more changes and each of them should be covered in a > > > separate > > > >> > design doc or FLIP. > > > >> > > > > > >> > > @Radu: On the different buffer types. I think this would be very > > > >> tricky. > > > >> > > Right now, people should also not rely on the fact that elements > > are > > > >> > > "FIFO". Some state backends might keep the elements in a > different > > > >> > > order and when you have merging windows/session windows the > order > > of > > > >> > > the elements will also not be preserved. > > > >> > > > > > >> > > Cheers, > > > >> > > Aljoscha > > > >> > > > > > >> > > On Wed, 13 Jul 2016 at 18:40 Radu Tudoran < > > radu.tudo...@huawei.com> > > > >> > wrote: > > > >> > > > > > >> > > > Hi, > > > >> > > > > > > >> > > > If it is to extend the Context to pass more information > between > > > the > > > >> > > > stages of processing a window (triggering -> process -> > > eviction), > > > >> > > > why not adding also a "EvictionInfo"? I think this might > > actually > > > >> > > > help with the issues discussed in the tread related to the > > > eviction > > > >> > policy. > > > >> > > > I could imagine using this parameter to pass the conditions, > > from > > > >> > > > the processing stage to the evictor, about what events to be > > > >> > eliminated. > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > public abstract class Context { > > > >> > > > > > > >> > > > public abstract EvictionInfo evictionInfo(); > > > >> > > > > > > >> > > > ... > > > >> > > > > > > >> > > > > > > >> > > > public abstract KEY key(); > > > >> > > > > > > >> > > > public abstract W window(); > > > >> > > > > > > >> > > > public abstract int id(); > > > >> > > > > > > >> > > > public abstract FiringInfo firingInfo(); > > > >> > > > > > > >> > > > public abstract Iterable<IN> elements(); > > > >> > > > > > > >> > > > public abstract void output(OUT value); > > > >> > > > > > > >> > > > } > > > >> > > > > > > >> > > > > > > >> > > > Also on a slightly unrelated issue - how hard it would be to > > > >> > > > introduce different types of buffers for the windows. > Currently > > > the > > > >> > > > existing one is behaving (when under processing) similar with > a > > > FIFO > > > >> > > > queue (in the sense that you need to start from beginning, > from > > > the > > > >> > oldest element). > > > >> > > > How about enabling for example also LIFO behavior (start > > iterating > > > >> > > > through the list from the most recent element). As in the > source > > > >> > > > queues or stacks are not actually used, perhaps we can just > pass > > > >> > > > policies to the iterator - or have custom itrators > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > Dr. Radu Tudoran > > > >> > > > Research Engineer - Big Data Expert > > > >> > > > IT R&D Division > > > >> > > > > > > >> > > > > > > >> > > > HUAWEI TECHNOLOGIES Duesseldorf GmbH European Research Center > > > >> > > > Riesstrasse 25, 80992 München > > > >> > > > > > > >> > > > E-mail: radu.tudo...@huawei.com > > > >> > > > Mobile: +49 15209084330 <01520%209084330> > > > >> > > > Telephone: +49 891588344173 <089%201588344173> > > > >> > > > > > > >> > > > HUAWEI TECHNOLOGIES Duesseldorf GmbH Hansaallee 205, 40549 > > > >> > > > Düsseldorf, Germany, www.huawei.com Registered > > > >> > > > Office: Düsseldorf, Register Court Düsseldorf, HRB 56063, > > Managing > > > >> > > > Director: Bo PENG, Wanzhou MENG, Lifang CHEN Sitz der > > > Gesellschaft: > > > >> > > > Düsseldorf, Amtsgericht Düsseldorf, HRB 56063, > > > >> > > > Geschäftsführer: Bo PENG, Wanzhou MENG, Lifang CHEN This > e-mail > > > and > > > >> > > > its attachments contain confidential information from HUAWEI, > > > which > > > >> > > > is intended only for the person or entity whose address is > > listed > > > >> > above. > > > >> > > > Any use of the information contained herein in any way > > (including, > > > >> > > > but not limited to, total or partial disclosure, reproduction, > > or > > > >> > > > dissemination) by persons other than the intended recipient(s) > > is > > > >> > > > prohibited. If you receive this e-mail in error, please notify > > the > > > >> > > > sender by phone or email immediately and delete it! > > > >> > > > > > > >> > > > > > > >> > > > -----Original Message----- > > > >> > > > From: Aljoscha Krettek [mailto:aljos...@apache.org] > > > >> > > > Sent: Wednesday, July 13, 2016 2:24 PM > > > >> > > > To: dev@flink.apache.org > > > >> > > > Subject: Re: [DISCUSS] FLIP-2 Extending Window Function > Metadata > > > >> > > > > > > >> > > > Sure, I also thought about this but went for the "extreme" > > > initially. > > > >> > > > If no-one objects I'll update the doc in a bit. > > > >> > > > > > > >> > > > On Wed, 13 Jul 2016 at 14:17 Stephan Ewen <se...@apache.org> > > > wrote: > > > >> > > > > > > >> > > > > Thanks for opening this. > > > >> > > > > > > > >> > > > > I see the need for having an extensible context object for > > > window > > > >> > > > > function invocations, but i think hiding every parameter in > > the > > > >> > > > > context is a bit unnatural. > > > >> > > > > > > > >> > > > > How about having a function "apply(Key, Values, > WindowContext, > > > >> > > > Collector)" > > > >> > > > > ? > > > >> > > > > It should be possible to write the straightforward use cases > > > >> > > > > without accessing the context object. > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > On Wed, Jul 13, 2016 at 1:56 PM, Aljoscha Krettek > > > >> > > > > <aljos...@apache.org> > > > >> > > > > wrote: > > > >> > > > > > > > >> > > > > > Hi, > > > >> > > > > > this is a proposal to introduce a new interface for the > > window > > > >> > > > > > function > > > >> > > > > to > > > >> > > > > > make it more extensible for the future where we might want > > to > > > >> > > > > > provide additional information about why a window fired to > > the > > > >> > > > > > user > > > >> > > > function: > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-2+Extending > > > >> > > > > +W > > > >> > > > > in > > > >> > > > > dow+Function+Metadata > > > >> > > > > > > > > >> > > > > > I'd appreciate your thoughts! > > > >> > > > > > > > > >> > > > > > Cheers, > > > >> > > > > > Aljoscha > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > > > > >