Great proposal! We should use consistent naming for the two API. Peter
2015-06-01 21:11 GMT+02:00 Márton Balassi <balassi.mar...@gmail.com>: > @Fabian: I hope that this is the complete list, correct me f I am wrong. :) > > I am opening a small PR with the changes on top of Aljoscha's one that > exposes the streaming partitioning then. > > On Mon, Jun 1, 2015 at 6:01 PM, Stephan Ewen <se...@apache.org> wrote: > > > +1 > > > > Good list and choices, Marton! > > > > On Mon, Jun 1, 2015 at 5:45 PM, Fabian Hueske <fhue...@gmail.com> wrote: > > > > > Thanks for bringing up this point! > > > > > > +1 for the renaming. > > > @Marton: Is this a "complete" list, i.e., did you go through both APIs > or > > > might there be more methods that are semantically identical but named > > > differently? > > > > > > 2015-06-01 17:31 GMT+02:00 Gyula Fóra <gyf...@apache.org>: > > > > > > > +1 for the changes proposed by Marton (before the release) > > > > > > > > Aljoscha Krettek <aljos...@apache.org> ezt írta (időpont: 2015. jún. > > 1., > > > > H, > > > > 16:32): > > > > > > > > > Yes, these renamings make sense. The partitionBy() is not yet in > the > > > > > master for streaming, though. > > > > > > > > > > On Mon, Jun 1, 2015 at 4:10 PM, Márton Balassi < > > > balassi.mar...@gmail.com > > > > > > > > > > wrote: > > > > > > Looking at the DataSet and DataStream APIs we have come to the > > > > conclusion > > > > > > with Aljoscha that there are a few methods that although > providing > > > the > > > > > same > > > > > > functionality are named differently. These are the following: > > > > > > > > > > > > 1. rebalance (batch) / distribute (streaming): Rebalances the > > > data > > > > > sent > > > > > > to the downstream operators thus equally distributing it. > > > > > > 2. partitionByHash, partitionCustom (batch) / partitionBy > > > > (streaming): > > > > > > Partitioning has just recently been exposed in the streaming > API > > > and > > > > > is not > > > > > > as refined as the batch one. The streaming partitionBy is > > actually > > > > > > partitionByHash. > > > > > > 3. Union (batch) / merge, connect (streaming): The streaming > > merge > > > > > does > > > > > > a union of two streams with the same type. Connect is > > conceptually > > > > > > different, it provides a way of sharing state between two > > streams > > > > with > > > > > > potentially different types without mapping them to a common > > type > > > > and > > > > > then > > > > > > merging them. This saves latency and an ugly mapping. The > former > > > > > advantage > > > > > > can be offset by proper operator chaining, the second one > would > > > > > remain if > > > > > > we did not have connect. > > > > > > > > > > > > To consolidate the naming I would suggest the following: > > > > > > > > > > > > 1. Rename streaming distribute to rebalance. > > > > > > 2. Rename streaming partitionBy to partitionByHash and file > JIRA > > > for > > > > > > custom partitioning support for streaming. > > > > > > 3. Rename streaming merge to union, leave streaming connect as > > it > > > > is. > > > > > > > > > > > > > > >