Great proposal! We should use consistent naming for the two API.

Peter

2015-06-01 21:11 GMT+02:00 Márton Balassi <balassi.mar...@gmail.com>:

> @Fabian: I hope that this is the complete list, correct me f I am wrong. :)
>
> I am opening a small PR with the changes on top of Aljoscha's one that
> exposes the streaming partitioning then.
>
> On Mon, Jun 1, 2015 at 6:01 PM, Stephan Ewen <se...@apache.org> wrote:
>
> > +1
> >
> > Good list and choices, Marton!
> >
> > On Mon, Jun 1, 2015 at 5:45 PM, Fabian Hueske <fhue...@gmail.com> wrote:
> >
> > > Thanks for bringing up this point!
> > >
> > > +1 for the renaming.
> > > @Marton: Is this a "complete" list, i.e., did you go through both APIs
> or
> > > might there be more methods that are semantically identical but named
> > > differently?
> > >
> > > 2015-06-01 17:31 GMT+02:00 Gyula Fóra <gyf...@apache.org>:
> > >
> > > > +1 for the changes proposed by Marton (before the release)
> > > >
> > > > Aljoscha Krettek <aljos...@apache.org> ezt írta (időpont: 2015. jún.
> > 1.,
> > > > H,
> > > > 16:32):
> > > >
> > > > > Yes, these renamings make sense. The partitionBy() is not yet in
> the
> > > > > master for streaming, though.
> > > > >
> > > > > On Mon, Jun 1, 2015 at 4:10 PM, Márton Balassi <
> > > balassi.mar...@gmail.com
> > > > >
> > > > > wrote:
> > > > > > Looking at the DataSet and DataStream APIs we have come to the
> > > > conclusion
> > > > > > with Aljoscha that there are a few methods that although
> providing
> > > the
> > > > > same
> > > > > > functionality are named differently. These are the following:
> > > > > >
> > > > > >    1.  rebalance (batch) / distribute (streaming): Rebalances the
> > > data
> > > > > sent
> > > > > >    to the downstream operators thus equally distributing it.
> > > > > >    2. partitionByHash, partitionCustom (batch) / partitionBy
> > > > (streaming):
> > > > > >    Partitioning has just recently been exposed in the streaming
> API
> > > and
> > > > > is not
> > > > > >    as refined as the batch one. The streaming partitionBy is
> > actually
> > > > > >    partitionByHash.
> > > > > >    3. Union (batch) / merge, connect (streaming): The streaming
> > merge
> > > > > does
> > > > > >    a union of two streams with the same type. Connect is
> > conceptually
> > > > > >    different, it provides a way of sharing state between two
> > streams
> > > > with
> > > > > >    potentially different types without mapping them to a common
> > type
> > > > and
> > > > > then
> > > > > >    merging them. This saves latency and an ugly mapping. The
> former
> > > > > advantage
> > > > > >    can be offset by proper operator chaining, the second one
> would
> > > > > remain if
> > > > > >    we did not have connect.
> > > > > >
> > > > > > To consolidate the naming I would suggest the following:
> > > > > >
> > > > > >    1. Rename streaming distribute to rebalance.
> > > > > >    2. Rename streaming partitionBy to partitionByHash and file
> JIRA
> > > for
> > > > > >    custom partitioning support for streaming.
> > > > > >    3. Rename streaming merge to union, leave streaming connect as
> > it
> > > > is.
> > > > >
> > > >
> > >
> >
>

Reply via email to