Generally I am in favor of making these name changes. My only concern is regarding to the one-input and multiple inputs operators.
There is a general problem with the n-ary operators regarding type safety, thats why we now have SingleInput and Co (two-input) operators. I think we should keep these. On Fri, May 8, 2015 at 11:38 AM, Aljoscha Krettek <aljos...@apache.org> wrote: > Hi, > since I'm currently reworking the Stream operators I thought it's a > good time to talk about the naming of some classes. We have some > legacy problems with lots of Operators, OperatorBases, TwoInput, > OneInput, Unary, Binary, etc. And maybe we can break things in > streaming to have more consistent and future-proof naming. > > In streaming, there are: > - Tasks, these are an AbstractInvokabe and contain the main loop of a > streaming vertex. They read from the inputs and forward data to the > operator implementation. > > - Operators, these are invoked by a Task and are responsible for the > actual logic of the operator. Think Map, Join, Reduce and so on. These > are responsible for calling the user-defined function. > > - Operators (again, I know), these are user facing classes (some > derived from DataStream, some not). There is for example > SingleOutputStreamOperator, for the result of a DataStream > transformation that has a single output. There are also > TemporalOperator and its derived classes StreamCrossOperator and > StreamJoinOperator. The actual operator inside a task (the ones I > mentioned before that are responsible for the user logic) that > executes a temporal join is called CoStreamWindow (with a > JoinWindowFunction). > > As I currently have it in my PR, there are two Task classes, one for > single input, and one for two-input operators. There are also the > corresponding operator interfaces for unary and binary operators (see > what I did there ... :D). > > What should we call all these classes (concepts). Also I'm heavily in > favour of dropping all the Stream (or Streaming) prefixes and suffixes > from the class names. I know I'm in streaming because the package is > named streaming. And we should not restrain ourselves because the > batch API also has things called operator. > > Also, the concept of one-input, two-input tasks and operators is not > very scalable, Maybe we should have a single interface for operators > that has a receiveElement(int, element) method that tells the operator > from which input an element came. Then we can scale this to n-ary > operators. This would of course have the overhead of always sending > along the number of the input instead of encoding the input number in > the method name, such as receiveElement1() and receiveElement2(). > > Any thoughts? :D (I know I'm writing the long annoying emails today > but I think it is important we discuss these things before being stuck > with them.) > > Cheers, > Aljoscha >