Thanks. Tim Ward
-----Original Message----- From: Matthias J. Sax <matth...@confluent.io> Sent: 13 August 2019 08:23 To: users@kafka.apache.org Subject: Re: How do I tell Kafka Streams not to repartition? Atm, it's not possible to tell Kafka Streams that repartitioning is not necessary after a key-changing operation at DSL level. I personally think it would be a good improvement to add this functionality. It's not the first time somebody asked for it. Feel free to create a JIRA (and maybe even contribute :) -- note, that we would need a KIP for this). The only alternative you have currently, is to not use `groupByKey().aggregate()`, but `transformValues()` (or similar) and implement the aggregation manually. -Matthias On 8/12/19 1:25 AM, Tim Ward wrote: > I'm using groupByKey, and it causes repartitioning. > > I suppose I could aggregate by parent ID, if the data structure into which I > aggregate by parent ID is itself a map from child ID to what I'm really > wanting to aggregate - is that what you had in mind? - I think it would work! > > Give or take a problem I've discovered with persistence following a crash in > the middle of aggregation, which I'll post separately. > > Tim Ward > > -----Original Message----- > From: Boyang Chen <reluctanthero...@gmail.com> > Sent: 09 August 2019 23:31 > To: users@kafka.apache.org > Subject: Re: How do I tell Kafka Streams not to repartition? > > In case I'm not making myself clear, any operation that changes the record > key will result in repartition. Since you don't want that, you shall choose > to call groupByKey afterwards and aggregation will happen on `parent id` > level. > > On Fri, Aug 9, 2019 at 3:27 PM Boyang Chen <reluctanthero...@gmail.com> > wrote: > >> Hey Tim, >> >> I think the functionality you need is groupByKey() which avoids >> repartitioning, feel free to check it out here: >> https://docs.confluent.io/current/streams/developer-guide/dsl-api.html#aggregating. >> Recommend you to read the whole thing but feel free just to search >> `groupByKey`. >> >> On Fri, Aug 9, 2019 at 7:14 AM Tim Ward <tim.w...@origamienergy.com> >> wrote: >> >>> I've got an input topic which is keyed by "parent ID". Each message >>> contains multiple items of data, each for a different "child ID". >>> >>> To process these items separately I flatMapValues() the stream to make a >>> new stream of the inner items of data, keyed by "child ID". >>> >>> Now, because I've changed the key, Kafka Streams thinks a repartition is >>> needed. But in fact it isn't, because all the inner items for a particular >>> "child ID" will be contained within messages keyed with the same "parent >>> ID". >>> >>> How do I tell Kafka Streams that there is no need to repartition in this >>> case, because all the data that should remain together in the same instance >>> of the application will do so without repartitioning? (I appreciate that >>> Streams can't know about the parent-child relationship unless I *do* tell >>> it in some way.) >>> >>> Tim Ward >>> >>> This email is from Origami Energy Limited. The contents of this email and >>> any attachment are confidential to the intended recipient(s). If you are >>> not an intended recipient: (i) do not use, disclose, distribute, copy or >>> publish this email or its contents; (ii) please contact Origami Energy >>> Limited immediately; and then (iii) delete this email. For more >>> information, our privacy policy is available here: >>> https://origamienergy.com/privacy-policy/. Origami Energy Limited >>> (company number 8619644) is a company registered in England with its >>> registered office at Ashcombe Court, Woolsack Way, Godalming, GU7 1LQ. >>> >> > This email is from Origami Energy Limited. The contents of this email and any > attachment are confidential to the intended recipient(s). If you are not an > intended recipient: (i) do not use, disclose, distribute, copy or publish > this email or its contents; (ii) please contact Origami Energy Limited > immediately; and then (iii) delete this email. For more information, our > privacy policy is available here: https://origamienergy.com/privacy-policy/. > Origami Energy Limited (company number 8619644) is a company registered in > England with its registered office at Ashcombe Court, Woolsack Way, > Godalming, GU7 1LQ. > This email is from Origami Energy Limited. The contents of this email and any attachment are confidential to the intended recipient(s). If you are not an intended recipient: (i) do not use, disclose, distribute, copy or publish this email or its contents; (ii) please contact Origami Energy Limited immediately; and then (iii) delete this email. For more information, our privacy policy is available here: https://origamienergy.com/privacy-policy/. Origami Energy Limited (company number 8619644) is a company registered in England with its registered office at Ashcombe Court, Woolsack Way, Godalming, GU7 1LQ.