Could you please clarify, if i just choose to use low level processor api, what directs it to do re partitioning. I am not using them in conjunction with DSLs, I plan to use only them. Apart from sink processors, are there any conditions for re partitioning to occur.
-Sameer. On Tue, Dec 19, 2017 at 11:06 AM, Sameer Kumar <sam.kum.w...@gmail.com> wrote: > I understand it now, even if we are able to attach custom partitioning. > The data shall still travel from stream nodes to broker on join topic, so > travel to network will still be there. > > -Sameer. > > On Tue, Dec 19, 2017 at 1:17 AM, Matthias J. Sax <matth...@confluent.io> > wrote: > >> > need to map the keys, modify them >> >> and then do a join. >> >> This will always trigger a rebalance. There is no API atm to tell KS >> that partitioning is preserved. >> >> Custom partitioner won't help for your case as far as I understand it. >> >> >> -Matthias >> >> On 12/17/17 9:48 PM, Sameer Kumar wrote: >> > Actually, I am doing joining after map. I need to map the keys, modify >> them >> > and then do a join. >> > >> > I was thinking of using always passing a partition key based on which >> > partition happens. >> > Step by step flow is:- >> > 1. Data is already partitoned by do userid. >> > 2. I do a map to joins impressions tied to a user with view >> notifications. >> > 3. I count valid impressions across different aggregations(i.e. across >> diff >> > dimension groups). >> > >> > Thanks, >> > -Sameer. >> > >> > On Mon, Dec 18, 2017 at 1:37 AM, Matthias J. Sax <matth...@confluent.io >> > >> > wrote: >> > >> >> Two comments: >> >> >> >> 1) As long, as you don't do an aggregation/join after a map(), there >> >> will be not repartitioning. Streams does repartitioning "lazy", ie, >> only >> >> if it's required. As long as you only chain filter/map etc, no >> >> repartitioning will be done. >> >> >> >> 2) Can't you use mapValue() instead of map()? If you use map() to only >> >> read the key but only modify the value (-> "data is still local") a >> >> custom partitioner won't help. Also, we are improving this in upcoming >> >> version 1.1 and allows read access to a key in mapValue() (cf. KIP-149 >> >> for details). >> >> >> >> Hope this helps. >> >> >> >> >> >> -Matthias >> >> >> >> On 12/17/17 8:20 AM, Sameer Kumar wrote: >> >>> I have multiple map and filter phases in my application dag and >> though I >> >> am >> >>> generating different keys at different points, the data is still >> local. >> >>> Re-partitioning for me here is adding unnecessary network shuffling, I >> >> want >> >>> to minimize it. >> >>> >> >>> -Sameer. >> >>> >> >>> On Friday, December 15, 2017, Matthias J. Sax <matth...@confluent.io> >> >> wrote: >> >>> >> >>>> It's not recommended to write a custom partitioner because it's >> pretty >> >>>> difficult to write a correct one. There are many dependencies and you >> >>>> need deep knowledge of Kafka Streams internals to get it write. >> >>>> Otherwise, your custom partitioner breaks Kafka Streams. >> >>>> >> >>>> That is the reason why it's not documented... >> >>>> >> >>>> Not sure so, what you try to achieve in the first place. What do you >> >>>> mean by >> >>>> >> >>>>> I want to make sure that during map phase, the keys >> >>>>>> produced adhere to the customized partitioner. >> >>>> >> >>>> Maybe you achieve what you want differently. >> >>>> >> >>>> >> >>>> -Matthias >> >>>> >> >>>> On 12/15/17 1:19 AM, Sameer Kumar wrote: >> >>>>> Hi, >> >>>>> >> >>>>> I want to use the custom partitioner in streams, I couldnt find the >> >> same >> >>>> in >> >>>>> the documentation. I want to make sure that during map phase, the >> keys >> >>>>> produced adhere to the customized partitioner. >> >>>>> >> >>>>> -Sameer. >> >>>>> >> >>>> >> >>>> >> >>> >> >> >> >> >> > >> >> >