Re: Using Custom Partitioner in Streams

Sameer Kumar Tue, 19 Dec 2017 03:49:33 -0800

Could you please clarify, if i just choose to use low level processor api,
what directs it to do re partitioning. I am not using them in conjunction
with DSLs, I plan to use only them.
Apart from sink processors, are there any conditions for re partitioning to
occur.


-Sameer.

On Tue, Dec 19, 2017 at 11:06 AM, Sameer Kumar <sam.kum.w...@gmail.com>
wrote:

> I understand it now, even if we are able to attach custom partitioning.
> The data shall still travel from stream nodes to broker on join topic, so
> travel to network will still be there.
>
> -Sameer.
>
> On Tue, Dec 19, 2017 at 1:17 AM, Matthias J. Sax <matth...@confluent.io>
> wrote:
>
>> > need to map the keys, modify them
>> >> and then do a join.
>>
>> This will always trigger a rebalance. There is no API atm to tell KS
>> that partitioning is preserved.
>>
>> Custom partitioner won't help for your case as far as I understand it.
>>
>>
>> -Matthias
>>
>> On 12/17/17 9:48 PM, Sameer Kumar wrote:
>> > Actually, I am doing joining after map. I need to map the keys, modify
>> them
>> > and then do a join.
>> >
>> > I was thinking of using always passing a partition key based on which
>> > partition happens.
>> > Step by step flow is:-
>> > 1. Data is already partitoned by do userid.
>> > 2. I do a map to joins impressions tied to a user with view
>> notifications.
>> > 3. I count valid impressions across different aggregations(i.e. across
>> diff
>> > dimension groups).
>> >
>> > Thanks,
>> > -Sameer.
>> >
>> > On Mon, Dec 18, 2017 at 1:37 AM, Matthias J. Sax <matth...@confluent.io
>> >
>> > wrote:
>> >
>> >> Two comments:
>> >>
>> >> 1) As long, as you don't do an aggregation/join after a map(), there
>> >> will be not repartitioning. Streams does repartitioning "lazy", ie,
>> only
>> >> if it's required. As long as you only chain filter/map etc, no
>> >> repartitioning will be done.
>> >>
>> >> 2) Can't you use mapValue() instead of map()? If you use map() to only
>> >> read the key but only modify the value (-> "data is still local") a
>> >> custom partitioner won't help. Also, we are improving this in upcoming
>> >> version 1.1 and allows read access to a key in mapValue() (cf. KIP-149
>> >> for details).
>> >>
>> >> Hope this helps.
>> >>
>> >>
>> >> -Matthias
>> >>
>> >> On 12/17/17 8:20 AM, Sameer Kumar wrote:
>> >>> I have multiple map and filter phases in my application dag and
>> though I
>> >> am
>> >>> generating different keys at different points, the data is still
>> local.
>> >>> Re-partitioning for me here is adding unnecessary network shuffling, I
>> >> want
>> >>> to minimize it.
>> >>>
>> >>> -Sameer.
>> >>>
>> >>> On Friday, December 15, 2017, Matthias J. Sax <matth...@confluent.io>
>> >> wrote:
>> >>>
>> >>>> It's not recommended to write a custom partitioner because it's
>> pretty
>> >>>> difficult to write a correct one. There are many dependencies and you
>> >>>> need deep knowledge of Kafka Streams internals to get it write.
>> >>>> Otherwise, your custom partitioner breaks Kafka Streams.
>> >>>>
>> >>>> That is the reason why it's not documented...
>> >>>>
>> >>>> Not sure so, what you try to achieve in the first place. What do you
>> >>>> mean by
>> >>>>
>> >>>>> I want to make sure that during map phase, the keys
>> >>>>>> produced adhere to the customized partitioner.
>> >>>>
>> >>>> Maybe you achieve what you want differently.
>> >>>>
>> >>>>
>> >>>> -Matthias
>> >>>>
>> >>>> On 12/15/17 1:19 AM, Sameer Kumar wrote:
>> >>>>> Hi,
>> >>>>>
>> >>>>> I want to use the custom partitioner in streams, I couldnt find the
>> >> same
>> >>>> in
>> >>>>> the documentation. I want to make sure that during map phase, the
>> keys
>> >>>>> produced adhere to the customized partitioner.
>> >>>>>
>> >>>>> -Sameer.
>> >>>>>
>> >>>>
>> >>>>
>> >>>
>> >>
>> >>
>> >
>>
>>
>

Re: Using Custom Partitioner in Streams

Reply via email to