[DISCUSS] Naming and Functionality of Stream Operators and Tasks

Aljoscha Krettek Fri, 08 May 2015 02:39:01 -0700

Hi,
since I'm currently reworking the Stream operators I thought it's a
good time to talk about the naming of some classes. We have some
legacy problems with lots of Operators, OperatorBases, TwoInput,
OneInput, Unary, Binary, etc. And maybe we can break things in
streaming to have more consistent and future-proof naming.


In streaming, there are:
- Tasks, these are an AbstractInvokabe and contain the main loop of a
streaming vertex. They read from the inputs and forward data to the
operator implementation.

- Operators, these are invoked by a Task and are responsible for the
actual logic of the operator. Think Map, Join, Reduce and so on. These
are responsible for calling the user-defined function.

- Operators (again, I know), these are user facing classes (some
derived from DataStream, some not). There is for example
SingleOutputStreamOperator, for the result of a DataStream
transformation that has a single output. There are also
TemporalOperator and its derived classes StreamCrossOperator and
StreamJoinOperator. The actual operator inside a task (the ones I
mentioned before that are responsible for the user logic) that
executes a temporal join is called CoStreamWindow (with a
JoinWindowFunction).

As I currently have it in my PR, there are two Task classes, one for
single input, and one for two-input operators. There are also the
corresponding operator interfaces for unary and binary operators (see
what I did there ... :D).

What should we call all these classes (concepts). Also I'm heavily in
favour of dropping all the Stream (or Streaming) prefixes and suffixes
from the class names. I know I'm in streaming because the package is
named streaming. And we should not restrain ourselves because the
batch API also has things called operator.

Also, the concept of one-input, two-input tasks and operators is not
very scalable, Maybe we should have a single interface for operators
that has a receiveElement(int, element) method that tells the operator
from which input an element came. Then we can scale this to n-ary
operators. This would of course have the overhead of always sending
along the number of the input instead of encoding the input number in
the method name, such as receiveElement1() and receiveElement2().

Any thoughts? :D (I know I'm writing the long annoying emails today
but I think it is important we discuss these things before being stuck
with them.)

Cheers,
Aljoscha

[DISCUSS] Naming and Functionality of Stream Operators and Tasks

Reply via email to