Yes, these renamings make sense. The partitionBy() is not yet in the master for streaming, though.
On Mon, Jun 1, 2015 at 4:10 PM, Márton Balassi <balassi.mar...@gmail.com> wrote: > Looking at the DataSet and DataStream APIs we have come to the conclusion > with Aljoscha that there are a few methods that although providing the same > functionality are named differently. These are the following: > > 1. rebalance (batch) / distribute (streaming): Rebalances the data sent > to the downstream operators thus equally distributing it. > 2. partitionByHash, partitionCustom (batch) / partitionBy (streaming): > Partitioning has just recently been exposed in the streaming API and is not > as refined as the batch one. The streaming partitionBy is actually > partitionByHash. > 3. Union (batch) / merge, connect (streaming): The streaming merge does > a union of two streams with the same type. Connect is conceptually > different, it provides a way of sharing state between two streams with > potentially different types without mapping them to a common type and then > merging them. This saves latency and an ugly mapping. The former advantage > can be offset by proper operator chaining, the second one would remain if > we did not have connect. > > To consolidate the naming I would suggest the following: > > 1. Rename streaming distribute to rebalance. > 2. Rename streaming partitionBy to partitionByHash and file JIRA for > custom partitioning support for streaming. > 3. Rename streaming merge to union, leave streaming connect as it is.