Hi Guys,

Thanks so much for your quick replies, very appreciated!

Thanks
Tianji

On Wed, Mar 1, 2017 at 2:53 PM, Matthias J. Sax <matth...@confluent.io>
wrote:

> It should be:
>
> groupBy -> always trigger repartitioning
> groupByKey -> maybe trigger repartitioning
>
> And there will not be two repartitioning topics. The repartitioning will
> be done by the groupBy/groupByKey operation, and thus, in the
> aggregation step we know that data is correctly partitioned and there
> will be no second repartitioning topic.
>
>
>
> -Matthias
>
> On 3/1/17 11:25 AM, Michael Noll wrote:
> > FYI: The difference between `groupBy` (may trigger re-partitioning) vs.
> > `groupByKey` (does not trigger re-partitioning) also applies to:
> >
> > - `map` vs. `mapValues`
> > - `flatMap` vs. `flatMapValues`
> >
> >
> >
> > On Wed, Mar 1, 2017 at 8:15 PM, Damian Guy <damian....@gmail.com> wrote:
> >
> >> If you use stream.groupByKey() then there will be no repartitioning as
> long
> >> as there have been no key changing operations preceding it, i.e, map,
> >> selectKey, flatMap, transform. If you use stream.groupBy(...) then we
> see
> >> it as a key changing operation, hence we need to repartition the data.
> >>
> >> On Wed, 1 Mar 2017 at 18:59 Tianji Li <skyah...@gmail.com> wrote:
> >>
> >>> Hi there,
> >>>
> >>> I wonder if it makes sense to give the option to disable auto
> >>> repartitioning while doing groupBy.
> >>>
> >>> I understand with https://issues.apache.org/jira/browse/KAFKA-3561,
> >>> an internal topic for repartition will be automatically created and
> >> synced
> >>> to brokers, which is useful when aggregation keys are not the ones used
> >>> when ingesting raw data.
> >>>
> >>> However, in my case, I have carefully partitioned the data when
> ingesting
> >>> my raw topics. If I do groupBy followed by aggregation, there will be
> TWO
> >>> change logs topics, one for groupBy another or aggregation.
> >>>
> >>> Does it make sense to make the groupBy one configurable?
> >>>
> >>> Thanks
> >>> Tianji
> >>>
> >>
> >
> >
> >
>
>

Reply via email to