+1 for the changes proposed by Marton (before the release) Aljoscha Krettek <aljos...@apache.org> ezt írta (időpont: 2015. jún. 1., H, 16:32):
> Yes, these renamings make sense. The partitionBy() is not yet in the > master for streaming, though. > > On Mon, Jun 1, 2015 at 4:10 PM, Márton Balassi <balassi.mar...@gmail.com> > wrote: > > Looking at the DataSet and DataStream APIs we have come to the conclusion > > with Aljoscha that there are a few methods that although providing the > same > > functionality are named differently. These are the following: > > > > 1. rebalance (batch) / distribute (streaming): Rebalances the data > sent > > to the downstream operators thus equally distributing it. > > 2. partitionByHash, partitionCustom (batch) / partitionBy (streaming): > > Partitioning has just recently been exposed in the streaming API and > is not > > as refined as the batch one. The streaming partitionBy is actually > > partitionByHash. > > 3. Union (batch) / merge, connect (streaming): The streaming merge > does > > a union of two streams with the same type. Connect is conceptually > > different, it provides a way of sharing state between two streams with > > potentially different types without mapping them to a common type and > then > > merging them. This saves latency and an ugly mapping. The former > advantage > > can be offset by proper operator chaining, the second one would > remain if > > we did not have connect. > > > > To consolidate the naming I would suggest the following: > > > > 1. Rename streaming distribute to rebalance. > > 2. Rename streaming partitionBy to partitionByHash and file JIRA for > > custom partitioning support for streaming. > > 3. Rename streaming merge to union, leave streaming connect as it is. >