+1 Good list and choices, Marton!
On Mon, Jun 1, 2015 at 5:45 PM, Fabian Hueske <fhue...@gmail.com> wrote: > Thanks for bringing up this point! > > +1 for the renaming. > @Marton: Is this a "complete" list, i.e., did you go through both APIs or > might there be more methods that are semantically identical but named > differently? > > 2015-06-01 17:31 GMT+02:00 Gyula Fóra <gyf...@apache.org>: > > > +1 for the changes proposed by Marton (before the release) > > > > Aljoscha Krettek <aljos...@apache.org> ezt írta (időpont: 2015. jún. 1., > > H, > > 16:32): > > > > > Yes, these renamings make sense. The partitionBy() is not yet in the > > > master for streaming, though. > > > > > > On Mon, Jun 1, 2015 at 4:10 PM, Márton Balassi < > balassi.mar...@gmail.com > > > > > > wrote: > > > > Looking at the DataSet and DataStream APIs we have come to the > > conclusion > > > > with Aljoscha that there are a few methods that although providing > the > > > same > > > > functionality are named differently. These are the following: > > > > > > > > 1. rebalance (batch) / distribute (streaming): Rebalances the > data > > > sent > > > > to the downstream operators thus equally distributing it. > > > > 2. partitionByHash, partitionCustom (batch) / partitionBy > > (streaming): > > > > Partitioning has just recently been exposed in the streaming API > and > > > is not > > > > as refined as the batch one. The streaming partitionBy is actually > > > > partitionByHash. > > > > 3. Union (batch) / merge, connect (streaming): The streaming merge > > > does > > > > a union of two streams with the same type. Connect is conceptually > > > > different, it provides a way of sharing state between two streams > > with > > > > potentially different types without mapping them to a common type > > and > > > then > > > > merging them. This saves latency and an ugly mapping. The former > > > advantage > > > > can be offset by proper operator chaining, the second one would > > > remain if > > > > we did not have connect. > > > > > > > > To consolidate the naming I would suggest the following: > > > > > > > > 1. Rename streaming distribute to rebalance. > > > > 2. Rename streaming partitionBy to partitionByHash and file JIRA > for > > > > custom partitioning support for streaming. > > > > 3. Rename streaming merge to union, leave streaming connect as it > > is. > > > > > >