Yes, these renamings make sense. The partitionBy() is not yet in the
master for streaming, though.

On Mon, Jun 1, 2015 at 4:10 PM, Márton Balassi <balassi.mar...@gmail.com> wrote:
> Looking at the DataSet and DataStream APIs we have come to the conclusion
> with Aljoscha that there are a few methods that although providing the same
> functionality are named differently. These are the following:
>
>    1.  rebalance (batch) / distribute (streaming): Rebalances the data sent
>    to the downstream operators thus equally distributing it.
>    2. partitionByHash, partitionCustom (batch) / partitionBy (streaming):
>    Partitioning has just recently been exposed in the streaming API and is not
>    as refined as the batch one. The streaming partitionBy is actually
>    partitionByHash.
>    3. Union (batch) / merge, connect (streaming): The streaming merge does
>    a union of two streams with the same type. Connect is conceptually
>    different, it provides a way of sharing state between two streams with
>    potentially different types without mapping them to a common type and then
>    merging them. This saves latency and an ugly mapping. The former advantage
>    can be offset by proper operator chaining, the second one would remain if
>    we did not have connect.
>
> To consolidate the naming I would suggest the following:
>
>    1. Rename streaming distribute to rebalance.
>    2. Rename streaming partitionBy to partitionByHash and file JIRA for
>    custom partitioning support for streaming.
>    3. Rename streaming merge to union, leave streaming connect as it is.

Reply via email to