Hi Guys, Thanks so much for your quick replies, very appreciated!
Thanks Tianji On Wed, Mar 1, 2017 at 2:53 PM, Matthias J. Sax <matth...@confluent.io> wrote: > It should be: > > groupBy -> always trigger repartitioning > groupByKey -> maybe trigger repartitioning > > And there will not be two repartitioning topics. The repartitioning will > be done by the groupBy/groupByKey operation, and thus, in the > aggregation step we know that data is correctly partitioned and there > will be no second repartitioning topic. > > > > -Matthias > > On 3/1/17 11:25 AM, Michael Noll wrote: > > FYI: The difference between `groupBy` (may trigger re-partitioning) vs. > > `groupByKey` (does not trigger re-partitioning) also applies to: > > > > - `map` vs. `mapValues` > > - `flatMap` vs. `flatMapValues` > > > > > > > > On Wed, Mar 1, 2017 at 8:15 PM, Damian Guy <damian....@gmail.com> wrote: > > > >> If you use stream.groupByKey() then there will be no repartitioning as > long > >> as there have been no key changing operations preceding it, i.e, map, > >> selectKey, flatMap, transform. If you use stream.groupBy(...) then we > see > >> it as a key changing operation, hence we need to repartition the data. > >> > >> On Wed, 1 Mar 2017 at 18:59 Tianji Li <skyah...@gmail.com> wrote: > >> > >>> Hi there, > >>> > >>> I wonder if it makes sense to give the option to disable auto > >>> repartitioning while doing groupBy. > >>> > >>> I understand with https://issues.apache.org/jira/browse/KAFKA-3561, > >>> an internal topic for repartition will be automatically created and > >> synced > >>> to brokers, which is useful when aggregation keys are not the ones used > >>> when ingesting raw data. > >>> > >>> However, in my case, I have carefully partitioned the data when > ingesting > >>> my raw topics. If I do groupBy followed by aggregation, there will be > TWO > >>> change logs topics, one for groupBy another or aggregation. > >>> > >>> Does it make sense to make the groupBy one configurable? > >>> > >>> Thanks > >>> Tianji > >>> > >> > > > > > > > >