Generally I am in favor of making these name changes. My only concern is
regarding to the one-input and multiple inputs operators.

There is a general problem with the n-ary operators regarding type safety,
thats why we now have SingleInput and Co (two-input) operators. I think we
should keep these.

On Fri, May 8, 2015 at 11:38 AM, Aljoscha Krettek <aljos...@apache.org>
wrote:

> Hi,
> since I'm currently reworking the Stream operators I thought it's a
> good time to talk about the naming of some classes. We have some
> legacy problems with lots of Operators, OperatorBases, TwoInput,
> OneInput, Unary, Binary, etc. And maybe we can break things in
> streaming to have more consistent and future-proof naming.
>
> In streaming, there are:
> - Tasks, these are an AbstractInvokabe and contain the main loop of a
> streaming vertex. They read from the inputs and forward data to the
> operator implementation.
>
> - Operators, these are invoked by a Task and are responsible for the
> actual logic of the operator. Think Map, Join, Reduce and so on. These
> are responsible for calling the user-defined function.
>
> - Operators (again, I know), these are user facing classes (some
> derived from DataStream, some not). There is for example
> SingleOutputStreamOperator, for the result of a DataStream
> transformation that has a single output. There are also
> TemporalOperator and its derived classes StreamCrossOperator and
> StreamJoinOperator. The actual operator inside a task (the ones I
> mentioned before that are responsible for the user logic) that
> executes a temporal join is called CoStreamWindow (with a
> JoinWindowFunction).
>
> As I currently have it in my PR, there are two Task classes, one for
> single input, and one for two-input operators. There are also the
> corresponding operator interfaces for unary and binary operators (see
> what I did there ... :D).
>
> What should we call all these classes (concepts). Also I'm heavily in
> favour of dropping all the Stream (or Streaming) prefixes and suffixes
> from the class names. I know I'm in streaming because the package is
> named streaming. And we should not restrain ourselves because the
> batch API also has things called operator.
>
> Also, the concept of one-input, two-input tasks and operators is not
> very scalable, Maybe we should have a single interface for operators
> that has a receiveElement(int, element) method that tells the operator
> from which input an element came. Then we can scale this to n-ary
> operators. This would of course have the overhead of always sending
> along the number of the input instead of encoding the input number in
> the method name, such as receiveElement1() and receiveElement2().
>
> Any thoughts? :D (I know I'm writing the long annoying emails today
> but I think it is important we discuss these things before being stuck
> with them.)
>
> Cheers,
> Aljoscha
>

Reply via email to