Every vote counts. :D
On Tue, May 12, 2015 at 11:04 AM, Matthias J. Sax <mj...@informatik.hu-berlin.de> wrote: > I like it. Not sure if my vote counts ;) > > On 05/12/2015 07:18 AM, Aljoscha Krettek wrote: >> My proposal for the runtime classes (per my Pull Request is this): >> >> StreamTask: base of streaming tasks, the task is the AbstractInvokable >> that runs in the TaskManager and invokes stream operators >> OneInputStreamTask and TwoOnputStreamTask and SourceStreamTask are the >> subclasses responsible for actual types of operations. >> >> StreamOperator: interface for StreamOperators such as Map, Reduce and so on >> OneInputOperator and TwoInputStreamOperator are the interface for >> operators with one input and two inputs respectively. >> >> There are also AbstractStreamOperator, which provides basic >> implementations for methods such as setup()/open()/close() and >> AbstractUdfStreamOperator, which is derived from >> AbstractStreamOperator. This is for operators that have user-code, it >> deals with calling the correct functions of RichUserFunctionS >> (open()/close()/setRuntimeContext()). >> >> I realised that we should probably not rename all the actual operators >> and remove the Stream prefix and suffix, that would be to big a change >> and orthogonal to my current PR. Other people can do it if they want. >> >> These are just my suggestions. Please suggest other consistent naming >> schemes if think mine to be bad. >> >> On Mon, May 11, 2015 at 9:40 PM, Stephan Ewen <se...@apache.org> wrote: >>> How about separating the discussions about runtime class renaming (there >>> seems to be consensus) from the >>> API class renaming (no consensus yet). >>> >>> To go ahead with the runtime classes, can you make a concrete suggestion >>> for more memorable/describing names? >>> >>> For the API classes, kick off a thread, if you want, but please clearly >>> mark in your discussion that this is about an API breaking change >>> to a user-facing API (that is still declared beta). >>> >>> >>> On Mon, May 11, 2015 at 10:18 AM, Aljoscha Krettek <aljos...@apache.org> >>> wrote: >>> >>>> Come to think of it, why do we even need SingleOutputStreamOperator? >>>> It is just a subclass of DataStream that has almost no functionality >>>> that couldn't be implemented in DataStream. I think it makes people >>>> wonder why the result of a transformation is not a DataStream but this >>>> mouthful of a class. >>>> >>>> And, I light of other possibilities such as MapDriver and PactDriver I >>>> am quite happy with calling the things StreamOperator and StreamMap. >>>> :D >>>> >>>> On Sat, May 9, 2015 at 5:20 PM, Márton Balassi <balassi.mar...@gmail.com> >>>> wrote: >>>>> Hi, >>>>> >>>>> I am in favor of removing the Stream (or Streaming) suffixes and >>>> prefixes. >>>>> I think that Gyula was also referring to those. >>>>> >>>>> I think the naming of the Tasks, and user facing operators >>>>> (SingleOutputStreamOperator and alike) are fine. >>>>> >>>>> As for the other bunch of Operators we could name them Drivers to be >>>> mostly >>>>> in line with the batch naming. By the way, most of the classes do not >>>> have >>>>> "Operator" in their name currently - e.g. the one encapsulating the map >>>>> functionality is called StreamMap, however the base classes >>>> (StreamOperator >>>>> and ChainableStreamOperator) have it in their name explicitly. I could go >>>>> with MapDriver instead of StreamMap, ChainableStreamOperator will be >>>>> eliminated anyway - StreamOperator needs a new name then: worst case >>>>> scenario PactDriver. :) >>>>> >>>>> As for n-ary operators I agree with Gyula. >>>>> >>>>> On Sat, May 9, 2015 at 4:44 PM, Aljoscha Krettek <aljos...@apache.org> >>>>> wrote: >>>>> >>>>>> Which name changes are you referring to? The proposed names in my >>>>>> recent PR? Or the dropping of Stream from all the classes. For the >>>>>> rest I was just rambling about how I don't like the names in the batch >>>>>> API. :D >>>>>> >>>>>> On Fri, May 8, 2015 at 12:31 PM, Gyula Fóra <gyula.f...@gmail.com> >>>> wrote: >>>>>>> Generally I am in favor of making these name changes. My only concern >>>> is >>>>>>> regarding to the one-input and multiple inputs operators. >>>>>>> >>>>>>> There is a general problem with the n-ary operators regarding type >>>>>> safety, >>>>>>> thats why we now have SingleInput and Co (two-input) operators. I >>>> think >>>>>> we >>>>>>> should keep these. >>>>>>> >>>>>>> On Fri, May 8, 2015 at 11:38 AM, Aljoscha Krettek < >>>> aljos...@apache.org> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> since I'm currently reworking the Stream operators I thought it's a >>>>>>>> good time to talk about the naming of some classes. We have some >>>>>>>> legacy problems with lots of Operators, OperatorBases, TwoInput, >>>>>>>> OneInput, Unary, Binary, etc. And maybe we can break things in >>>>>>>> streaming to have more consistent and future-proof naming. >>>>>>>> >>>>>>>> In streaming, there are: >>>>>>>> - Tasks, these are an AbstractInvokabe and contain the main loop of a >>>>>>>> streaming vertex. They read from the inputs and forward data to the >>>>>>>> operator implementation. >>>>>>>> >>>>>>>> - Operators, these are invoked by a Task and are responsible for the >>>>>>>> actual logic of the operator. Think Map, Join, Reduce and so on. >>>> These >>>>>>>> are responsible for calling the user-defined function. >>>>>>>> >>>>>>>> - Operators (again, I know), these are user facing classes (some >>>>>>>> derived from DataStream, some not). There is for example >>>>>>>> SingleOutputStreamOperator, for the result of a DataStream >>>>>>>> transformation that has a single output. There are also >>>>>>>> TemporalOperator and its derived classes StreamCrossOperator and >>>>>>>> StreamJoinOperator. The actual operator inside a task (the ones I >>>>>>>> mentioned before that are responsible for the user logic) that >>>>>>>> executes a temporal join is called CoStreamWindow (with a >>>>>>>> JoinWindowFunction). >>>>>>>> >>>>>>>> As I currently have it in my PR, there are two Task classes, one for >>>>>>>> single input, and one for two-input operators. There are also the >>>>>>>> corresponding operator interfaces for unary and binary operators (see >>>>>>>> what I did there ... :D). >>>>>>>> >>>>>>>> What should we call all these classes (concepts). Also I'm heavily in >>>>>>>> favour of dropping all the Stream (or Streaming) prefixes and >>>> suffixes >>>>>>>> from the class names. I know I'm in streaming because the package is >>>>>>>> named streaming. And we should not restrain ourselves because the >>>>>>>> batch API also has things called operator. >>>>>>>> >>>>>>>> Also, the concept of one-input, two-input tasks and operators is not >>>>>>>> very scalable, Maybe we should have a single interface for operators >>>>>>>> that has a receiveElement(int, element) method that tells the >>>> operator >>>>>>>> from which input an element came. Then we can scale this to n-ary >>>>>>>> operators. This would of course have the overhead of always sending >>>>>>>> along the number of the input instead of encoding the input number in >>>>>>>> the method name, such as receiveElement1() and receiveElement2(). >>>>>>>> >>>>>>>> Any thoughts? :D (I know I'm writing the long annoying emails today >>>>>>>> but I think it is important we discuss these things before being >>>> stuck >>>>>>>> with them.) >>>>>>>> >>>>>>>> Cheers, >>>>>>>> Aljoscha >>>>>>>> >>>>>> >>>> >> >