Re: [DISCUSS] Consolidate method naming between the batch and streaming API

Gyula Fóra Mon, 01 Jun 2015 08:31:59 -0700

+1 for the changes proposed by Marton (before the release)

Aljoscha Krettek <[email protected]> ezt írta (időpont: 2015. jún. 1., H,
16:32):


> Yes, these renamings make sense. The partitionBy() is not yet in the
> master for streaming, though.
>
> On Mon, Jun 1, 2015 at 4:10 PM, Márton Balassi <[email protected]>
> wrote:
> > Looking at the DataSet and DataStream APIs we have come to the conclusion
> > with Aljoscha that there are a few methods that although providing the
> same
> > functionality are named differently. These are the following:
> >
> >    1.  rebalance (batch) / distribute (streaming): Rebalances the data
> sent
> >    to the downstream operators thus equally distributing it.
> >    2. partitionByHash, partitionCustom (batch) / partitionBy (streaming):
> >    Partitioning has just recently been exposed in the streaming API and
> is not
> >    as refined as the batch one. The streaming partitionBy is actually
> >    partitionByHash.
> >    3. Union (batch) / merge, connect (streaming): The streaming merge
> does
> >    a union of two streams with the same type. Connect is conceptually
> >    different, it provides a way of sharing state between two streams with
> >    potentially different types without mapping them to a common type and
> then
> >    merging them. This saves latency and an ugly mapping. The former
> advantage
> >    can be offset by proper operator chaining, the second one would
> remain if
> >    we did not have connect.
> >
> > To consolidate the naming I would suggest the following:
> >
> >    1. Rename streaming distribute to rebalance.
> >    2. Rename streaming partitionBy to partitionByHash and file JIRA for
> >    custom partitioning support for streaming.
> >    3. Rename streaming merge to union, leave streaming connect as it is.
>

Re: [DISCUSS] Consolidate method naming between the batch and streaming API

Reply via email to