If you use stream.groupByKey() then there will be no repartitioning as long
as there have been no key changing operations preceding it, i.e, map,
selectKey, flatMap, transform. If you use stream.groupBy(...) then we see
it as a key changing operation, hence we need to repartition the data.

On Wed, 1 Mar 2017 at 18:59 Tianji Li <skyah...@gmail.com> wrote:

> Hi there,
>
> I wonder if it makes sense to give the option to disable auto
> repartitioning while doing groupBy.
>
> I understand with https://issues.apache.org/jira/browse/KAFKA-3561,
> an internal topic for repartition will be automatically created and synced
> to brokers, which is useful when aggregation keys are not the ones used
> when ingesting raw data.
>
> However, in my case, I have carefully partitioned the data when ingesting
> my raw topics. If I do groupBy followed by aggregation, there will be TWO
> change logs topics, one for groupBy another or aggregation.
>
> Does it make sense to make the groupBy one configurable?
>
> Thanks
> Tianji
>

Reply via email to