Come to think of it, why do we even need SingleOutputStreamOperator? It is just a subclass of DataStream that has almost no functionality that couldn't be implemented in DataStream. I think it makes people wonder why the result of a transformation is not a DataStream but this mouthful of a class.
And, I light of other possibilities such as MapDriver and PactDriver I am quite happy with calling the things StreamOperator and StreamMap. :D On Sat, May 9, 2015 at 5:20 PM, Márton Balassi <balassi.mar...@gmail.com> wrote: > Hi, > > I am in favor of removing the Stream (or Streaming) suffixes and prefixes. > I think that Gyula was also referring to those. > > I think the naming of the Tasks, and user facing operators > (SingleOutputStreamOperator and alike) are fine. > > As for the other bunch of Operators we could name them Drivers to be mostly > in line with the batch naming. By the way, most of the classes do not have > "Operator" in their name currently - e.g. the one encapsulating the map > functionality is called StreamMap, however the base classes (StreamOperator > and ChainableStreamOperator) have it in their name explicitly. I could go > with MapDriver instead of StreamMap, ChainableStreamOperator will be > eliminated anyway - StreamOperator needs a new name then: worst case > scenario PactDriver. :) > > As for n-ary operators I agree with Gyula. > > On Sat, May 9, 2015 at 4:44 PM, Aljoscha Krettek <aljos...@apache.org> > wrote: > >> Which name changes are you referring to? The proposed names in my >> recent PR? Or the dropping of Stream from all the classes. For the >> rest I was just rambling about how I don't like the names in the batch >> API. :D >> >> On Fri, May 8, 2015 at 12:31 PM, Gyula Fóra <gyula.f...@gmail.com> wrote: >> > Generally I am in favor of making these name changes. My only concern is >> > regarding to the one-input and multiple inputs operators. >> > >> > There is a general problem with the n-ary operators regarding type >> safety, >> > thats why we now have SingleInput and Co (two-input) operators. I think >> we >> > should keep these. >> > >> > On Fri, May 8, 2015 at 11:38 AM, Aljoscha Krettek <aljos...@apache.org> >> > wrote: >> > >> >> Hi, >> >> since I'm currently reworking the Stream operators I thought it's a >> >> good time to talk about the naming of some classes. We have some >> >> legacy problems with lots of Operators, OperatorBases, TwoInput, >> >> OneInput, Unary, Binary, etc. And maybe we can break things in >> >> streaming to have more consistent and future-proof naming. >> >> >> >> In streaming, there are: >> >> - Tasks, these are an AbstractInvokabe and contain the main loop of a >> >> streaming vertex. They read from the inputs and forward data to the >> >> operator implementation. >> >> >> >> - Operators, these are invoked by a Task and are responsible for the >> >> actual logic of the operator. Think Map, Join, Reduce and so on. These >> >> are responsible for calling the user-defined function. >> >> >> >> - Operators (again, I know), these are user facing classes (some >> >> derived from DataStream, some not). There is for example >> >> SingleOutputStreamOperator, for the result of a DataStream >> >> transformation that has a single output. There are also >> >> TemporalOperator and its derived classes StreamCrossOperator and >> >> StreamJoinOperator. The actual operator inside a task (the ones I >> >> mentioned before that are responsible for the user logic) that >> >> executes a temporal join is called CoStreamWindow (with a >> >> JoinWindowFunction). >> >> >> >> As I currently have it in my PR, there are two Task classes, one for >> >> single input, and one for two-input operators. There are also the >> >> corresponding operator interfaces for unary and binary operators (see >> >> what I did there ... :D). >> >> >> >> What should we call all these classes (concepts). Also I'm heavily in >> >> favour of dropping all the Stream (or Streaming) prefixes and suffixes >> >> from the class names. I know I'm in streaming because the package is >> >> named streaming. And we should not restrain ourselves because the >> >> batch API also has things called operator. >> >> >> >> Also, the concept of one-input, two-input tasks and operators is not >> >> very scalable, Maybe we should have a single interface for operators >> >> that has a receiveElement(int, element) method that tells the operator >> >> from which input an element came. Then we can scale this to n-ary >> >> operators. This would of course have the overhead of always sending >> >> along the number of the input instead of encoding the input number in >> >> the method name, such as receiveElement1() and receiveElement2(). >> >> >> >> Any thoughts? :D (I know I'm writing the long annoying emails today >> >> but I think it is important we discuss these things before being stuck >> >> with them.) >> >> >> >> Cheers, >> >> Aljoscha >> >> >>