Thanks for bringing up this point! +1 for the renaming. @Marton: Is this a "complete" list, i.e., did you go through both APIs or might there be more methods that are semantically identical but named differently?
2015-06-01 17:31 GMT+02:00 Gyula Fóra <gyf...@apache.org>: > +1 for the changes proposed by Marton (before the release) > > Aljoscha Krettek <aljos...@apache.org> ezt írta (időpont: 2015. jún. 1., > H, > 16:32): > > > Yes, these renamings make sense. The partitionBy() is not yet in the > > master for streaming, though. > > > > On Mon, Jun 1, 2015 at 4:10 PM, Márton Balassi <balassi.mar...@gmail.com > > > > wrote: > > > Looking at the DataSet and DataStream APIs we have come to the > conclusion > > > with Aljoscha that there are a few methods that although providing the > > same > > > functionality are named differently. These are the following: > > > > > > 1. rebalance (batch) / distribute (streaming): Rebalances the data > > sent > > > to the downstream operators thus equally distributing it. > > > 2. partitionByHash, partitionCustom (batch) / partitionBy > (streaming): > > > Partitioning has just recently been exposed in the streaming API and > > is not > > > as refined as the batch one. The streaming partitionBy is actually > > > partitionByHash. > > > 3. Union (batch) / merge, connect (streaming): The streaming merge > > does > > > a union of two streams with the same type. Connect is conceptually > > > different, it provides a way of sharing state between two streams > with > > > potentially different types without mapping them to a common type > and > > then > > > merging them. This saves latency and an ugly mapping. The former > > advantage > > > can be offset by proper operator chaining, the second one would > > remain if > > > we did not have connect. > > > > > > To consolidate the naming I would suggest the following: > > > > > > 1. Rename streaming distribute to rebalance. > > > 2. Rename streaming partitionBy to partitionByHash and file JIRA for > > > custom partitioning support for streaming. > > > 3. Rename streaming merge to union, leave streaming connect as it > is. > > >