Hi Sophie, Thank you for your reply, very insightful. Looking forward hearing others opinion as well on this.
Kind regards, Levani > On Nov 6, 2019, at 1:30 AM, Sophie Blee-Goldman <sop...@confluent.io> wrote: > >> Personally, I think Matthias’s concern is valid, but on the other hand > Kafka Streams has already >> optimizer in place which alters topology independently from user > > I agree (with you) and think this is a good way to put it -- we currently > auto-repartition for the user so > that they don't have to walk through their entire topology and reason about > when and where to place a > `.through` (or the new `.repartition`), so why suddenly force this onto the > user? How certain are we that > users will always get this right? It's easy to imagine that during > development, you write your new app with > correctly placed repartitions in order to use this new feature. During the > course of development you end up > tweaking the topology, but don't remember to review or move the > repartitioning since you're used to Streams > doing this for you. If you use only single-partition topics for testing, > you might not even notice your app is > spitting out incorrect results! > > Anyways, I feel pretty strongly that it would be weird to introduce a new > feature and say that to use it, you can't take > advantage of this other feature anymore. Also, is it possible our > optimization framework could ever include an > optimized repartitioning strategy that is better than what a user could > achieve by manually inserting repartitions? > Do we expect users to have a deep understanding of the best way to > repartition their particular topology, or is it > likely they will end up over-repartitioning either due to missed > optimizations or unnecessary extra repartitions? > I think many users would prefer to just say "if there *is* a repartition > required at this point in the topology, it should > have N partitions" > > As to the idea of adding `numberOfPartitions` to Grouped rather than > adding a new parameter to groupBy, that does seem more in line with the > current syntax so +1 from me > > On Tue, Nov 5, 2019 at 2:07 PM Levani Kokhreidze <levani.co...@gmail.com> > wrote: > >> Hello all, >> >> While https://github.com/apache/kafka/pull/7170 < >> https://github.com/apache/kafka/pull/7170> is under review and it’s >> almost done, I want to resurrect discussion about this KIP to address >> couple of concerns raised by Matthias and John. >> >> As a reminder, idea of the KIP-221 was to allow DSL users control over >> repartitioning and parallelism of sub-topologies by: >> 1) Introducing new KStream#repartition operation which is done in >> https://github.com/apache/kafka/pull/7170 < >> https://github.com/apache/kafka/pull/7170> >> 2) Add new KStream#groupBy(Repartitioned) operation, which is planned to >> be separate PR. >> >> While all agree about general implementation and idea behind >> https://github.com/apache/kafka/pull/7170 < >> https://github.com/apache/kafka/pull/7170> PR, introducing new >> KStream#groupBy(Repartitioned) method overload raised some questions during >> the review. >> Matthias raised concern that there can be cases when user uses >> `KStream#groupBy(Repartitioned)` operation, but actual repartitioning may >> not required, thus configuration passed via `Repartitioned` would never be >> applied (Matthias, please correct me if I misinterpreted your comment). >> So instead, if user wants to control parallelism of sub-topologies, he or >> she should always use `KStream#repartition` operation before groupBy. Full >> comment can be seen here: >> https://github.com/apache/kafka/pull/7170#issuecomment-519303125 < >> https://github.com/apache/kafka/pull/7170#issuecomment-519303125> >> >> On the same topic, John pointed out that, from API design perspective, we >> shouldn’t intertwine configuration classes of different operators between >> one another. So instead of introducing new `KStream#groupBy(Repartitioned)` >> for specifying number of partitions for internal topic, we should update >> existing `Grouped` class with `numberOfPartitions` field. >> >> Personally, I think Matthias’s concern is valid, but on the other hand >> Kafka Streams has already optimizer in place which alters topology >> independently from user. So maybe it makes sense if Kafka Streams, >> internally would optimize topology in the best way possible, even if in >> some cases this means ignoring some operator configurations passed by the >> user. Also, I agree with John about API design semantics. If we go through >> with the changes for `KStream#groupBy` operation, it makes more sense to >> add `numberOfPartitions` field to `Grouped` class instead of introducing >> new `KStream#groupBy(Repartitioned)` method overload. >> >> I would really appreciate communities feedback on this. >> >> Kind regards, >> Levani >> >> >> >>> On Oct 17, 2019, at 12:57 AM, Sophie Blee-Goldman <sop...@confluent.io> >> wrote: >>> >>> Hey Levani, >>> >>> I think people are busy with the upcoming 2.4 release, and don't have >> much >>> spare time at the >>> moment. It's kind of a difficult time to get attention on things, but >> feel >>> free to pick up something else >>> to work on in the meantime until things have calmed down a bit! >>> >>> Cheers, >>> Sophie >>> >>> >>> On Wed, Oct 16, 2019 at 11:26 AM Levani Kokhreidze < >> levani.co...@gmail.com <mailto:levani.co...@gmail.com>> >>> wrote: >>> >>>> Hello all, >>>> >>>> Sorry for bringing this thread again, but I would like to get some >>>> attention on this PR: https://github.com/apache/kafka/pull/7170 < >> https://github.com/apache/kafka/pull/7170> < >>>> https://github.com/apache/kafka/pull/7170 < >> https://github.com/apache/kafka/pull/7170>> >>>> It's been a while now and I would love to move on to other KIPs as well. >>>> Please let me know if you have any concerns. >>>> >>>> Regards, >>>> Levani >>>> >>>> >>>>> On Jul 26, 2019, at 11:25 AM, Levani Kokhreidze < >> levani.co...@gmail.com <mailto:levani.co...@gmail.com>> >>>> wrote: >>>>> >>>>> Hi all, >>>>> >>>>> Here’s voting thread for this KIP: >>>> https://www.mail-archive.com/dev@kafka.apache.org/msg99680.html < >> https://www.mail-archive.com/dev@kafka.apache.org/msg99680.html> < >>>> https://www.mail-archive.com/dev@kafka.apache.org/msg99680.html < >> https://www.mail-archive.com/dev@kafka.apache.org/msg99680.html>> >>>>> >>>>> Regards, >>>>> Levani >>>>> >>>>>> On Jul 24, 2019, at 11:15 PM, Levani Kokhreidze < >> levani.co...@gmail.com <mailto:levani.co...@gmail.com> >>>> <mailto:levani.co...@gmail.com <mailto:levani.co...@gmail.com>>> wrote: >>>>>> >>>>>> Hi Matthias, >>>>>> >>>>>> Thanks for the suggestion. I Don’t have strong opinion on that one. >>>>>> Agree that avoiding unnecessary method overloads is a good idea. >>>>>> >>>>>> Updated KIP >>>>>> >>>>>> Regards, >>>>>> Levani >>>>>> >>>>>> >>>>>>> On Jul 24, 2019, at 8:50 PM, Matthias J. Sax <matth...@confluent.io >> <mailto:matth...@confluent.io> >>>> <mailto:matth...@confluent.io <mailto:matth...@confluent.io>>> wrote: >>>>>>> >>>>>>> One question: >>>>>>> >>>>>>> Why do we add >>>>>>> >>>>>>>> Repartitioned#with(final String name, final int numberOfPartitions) >>>>>>> >>>>>>> It seems that `#with(String name)`, `#numberOfPartitions(int)` in >>>>>>> combination with `withName()` and `withNumberOfPartitions()` should >> be >>>>>>> sufficient. Users can chain the method calls. >>>>>>> >>>>>>> (I think it's valuable to keep the number of overload small if >>>> possible.) >>>>>>> >>>>>>> Otherwise LGTM. >>>>>>> >>>>>>> >>>>>>> -Matthias >>>>>>> >>>>>>> >>>>>>> On 7/23/19 2:18 PM, Levani Kokhreidze wrote: >>>>>>>> Hello, >>>>>>>> >>>>>>>> Thanks all for your feedback. >>>>>>>> I started voting procedure for this KIP. If there’re any other >>>> concerns about this KIP, please let me know. >>>>>>>> >>>>>>>> Regards, >>>>>>>> Levani >>>>>>>> >>>>>>>>> On Jul 20, 2019, at 8:39 PM, Levani Kokhreidze < >>>> levani.co...@gmail.com <mailto:levani.co...@gmail.com> <mailto: >> levani.co...@gmail.com <mailto:levani.co...@gmail.com>>> wrote: >>>>>>>>> >>>>>>>>> Hi Matthias, >>>>>>>>> >>>>>>>>> Thanks for the suggestion, makes sense. >>>>>>>>> I’ve updated KIP ( >>>> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint >> < >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint >>> >>>> < >>>> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint >> < >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint >>>> >>>> < >>>> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint >> < >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint >>> >>>> < >>>> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint >> < >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint >>> >>>>>> ). >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Levani >>>>>>>>> >>>>>>>>> >>>>>>>>>> On Jul 20, 2019, at 3:53 AM, Matthias J. Sax < >> matth...@confluent.io <mailto:matth...@confluent.io> >>>> <mailto:matth...@confluent.io <mailto:matth...@confluent.io>> <mailto: >> matth...@confluent.io <mailto:matth...@confluent.io> <mailto: >>>> matth...@confluent.io <mailto:matth...@confluent.io>>>> wrote: >>>>>>>>>> >>>>>>>>>> Thanks for driving the KIP. >>>>>>>>>> >>>>>>>>>> I agree that users need to be able to specify a partitioning >>>> strategy. >>>>>>>>>> >>>>>>>>>> Sophie raises a fair point about topic configs and producer >>>> configs. My >>>>>>>>>> take is, that consider `Repartitioned` as an "extension" to >>>> `Produced`, >>>>>>>>>> that adds topic configuration, is a good way to think about it and >>>> helps >>>>>>>>>> to keep the API "clean". >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> With regard to method names. I would prefer to avoid >> abbreviations. >>>> Can >>>>>>>>>> we rename: >>>>>>>>>> >>>>>>>>>> `withNumOfPartitions` -> `withNumberOfPartitions` >>>>>>>>>> >>>>>>>>>> Furthermore, it might be good to add some more `static` methods: >>>>>>>>>> >>>>>>>>>> - Repartitioned.with(Serde<K>, Serde<V>) >>>>>>>>>> - Repartitioned.withNumberOfPartitions(int) >>>>>>>>>> - Repartitioned.streamPartitioner(StreamPartitioner) >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -Matthias >>>>>>>>>> >>>>>>>>>> On 7/19/19 3:33 PM, Levani Kokhreidze wrote: >>>>>>>>>>> Totally agree. I think in KStream interface it makes sense to >> have >>>> some duplicate configurations between operators in order to keep API >> simple >>>> and usable. >>>>>>>>>>> Also, as more surface API has, harder it is to have proper >>>> backward compatibility. >>>>>>>>>>> While initial idea of keeping topic level configs separate was >>>> exciting, having Repartitioned class encapsulate some producer level >>>> configs makes API more readable. >>>>>>>>>>> >>>>>>>>>>> Regards, >>>>>>>>>>> Levani >>>>>>>>>>> >>>>>>>>>>>> On Jul 20, 2019, at 1:15 AM, Sophie Blee-Goldman < >>>> sop...@confluent.io <mailto:sop...@confluent.io> <mailto: >> sop...@confluent.io <mailto:sop...@confluent.io>> <mailto: >>>> sop...@confluent.io <mailto:sop...@confluent.io> <mailto: >> sop...@confluent.io <mailto:sop...@confluent.io>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> I think that is a good point about trying to keep producer level >>>>>>>>>>>> configurations and (repartition) topic level considerations >>>> separate. >>>>>>>>>>>> Number of partitions is definitely purely a topic level >>>> configuration. But >>>>>>>>>>>> on some level, serdes and partitioners are just as much a topic >>>>>>>>>>>> configuration as a producer one. You could have two producers >>>> configured >>>>>>>>>>>> with different serdes and/or partitioners, but if they are >>>> writing to the >>>>>>>>>>>> same topic the result would be very difficult to part. So in a >>>> sense, these >>>>>>>>>>>> are configurations of topics in Streams, not just producers. >>>>>>>>>>>> >>>>>>>>>>>> Another way to think of it: while the Streams API is not always >>>> true to >>>>>>>>>>>> this, ideally all the relevant configs for an operator are >>>> wrapped into a >>>>>>>>>>>> single object (in this case, Repartitioned). We could instead >>>> split out the >>>>>>>>>>>> fields in common with Produced into a separate parameter to keep >>>> topic and >>>>>>>>>>>> producer level configurations separate, but this increases the >>>> API surface >>>>>>>>>>>> area by a lot. It's much more straightforward to just say "this >> is >>>>>>>>>>>> everything that this particular operator needs" without worrying >>>> about what >>>>>>>>>>>> exactly you're specifying. >>>>>>>>>>>> >>>>>>>>>>>> I suppose you could alternatively make Produced a field of >>>> Repartitioned, >>>>>>>>>>>> but I don't think we do this kind of composition elsewhere in >>>> Streams at >>>>>>>>>>>> the moment >>>>>>>>>>>> >>>>>>>>>>>> On Fri, Jul 19, 2019 at 1:45 PM Levani Kokhreidze < >>>> levani.co...@gmail.com <mailto:levani.co...@gmail.com> <mailto: >> levani.co...@gmail.com <mailto:levani.co...@gmail.com>> <mailto: >>>> levani.co...@gmail.com <mailto:levani.co...@gmail.com> <mailto: >> levani.co...@gmail.com <mailto:levani.co...@gmail.com>>>> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi Bill, >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks a lot for the feedback. >>>>>>>>>>>>> Yes, that makes sense. I’ve updated KIP with >>>> `Repartitioned#partitioner` >>>>>>>>>>>>> configuration. >>>>>>>>>>>>> In the beginning, I wanted to introduce a class for topic level >>>>>>>>>>>>> configuration and keep topic level and producer level >>>> configurations (such >>>>>>>>>>>>> as Produced) separately (see my second email in this thread). >>>>>>>>>>>>> But while looking at the semantics of KStream interface, I >>>> couldn’t really >>>>>>>>>>>>> figure out good operation name for Topic level configuration >>>> class and just >>>>>>>>>>>>> introducing `Topic` config class was kinda breaking the >>>> semantics. >>>>>>>>>>>>> So I think having Repartitioned class which encapsulates topic >>>> and >>>>>>>>>>>>> producer level configurations for internal topics is viable >>>> thing to do. >>>>>>>>>>>>> >>>>>>>>>>>>> Regards, >>>>>>>>>>>>> Levani >>>>>>>>>>>>> >>>>>>>>>>>>>> On Jul 19, 2019, at 7:47 PM, Bill Bejeck <bbej...@gmail.com >> <mailto:bbej...@gmail.com> >>>> <mailto:bbej...@gmail.com <mailto:bbej...@gmail.com>> <mailto: >> bbej...@gmail.com <mailto:bbej...@gmail.com> <mailto: >>>> bbej...@gmail.com <mailto:bbej...@gmail.com>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> Hi Lavani, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks for resurrecting this KIP. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I'm also a +1 for adding a partition option. In addition to >>>> the reason >>>>>>>>>>>>>> provided by John, my reasoning is: >>>>>>>>>>>>>> >>>>>>>>>>>>>> 1. Users may want to use something other than hash-based >>>> partitioning >>>>>>>>>>>>>> 2. Users may wish to partition on something different than the >>>> key >>>>>>>>>>>>>> without having to change the key. For example: >>>>>>>>>>>>>> 1. A combination of fields in the value in conjunction with >>>> the key >>>>>>>>>>>>>> 2. Something other than the key >>>>>>>>>>>>>> 3. We allow users to specify a partitioner on Produced hence >> in >>>>>>>>>>>>>> KStream.to and KStream.through, so it makes sense for API >>>> consistency. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Just my 2 cents. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>> Bill >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, Jul 19, 2019 at 5:46 AM Levani Kokhreidze < >>>>>>>>>>>>> levani.co...@gmail.com <mailto:levani.co...@gmail.com> >> <mailto:levani.co...@gmail.com <mailto:levani.co...@gmail.com>> <mailto: >>>> levani.co...@gmail.com <mailto:levani.co...@gmail.com> <mailto: >> levani.co...@gmail.com <mailto:levani.co...@gmail.com>>>> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi John, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> In my mind it makes sense. >>>>>>>>>>>>>>> If we add partitioner configuration to Repartitioned class, >>>> with the >>>>>>>>>>>>>>> combination of specifying number of partitions for internal >>>> topics, user >>>>>>>>>>>>>>> will have opportunity to ensure co-partitioning before join >>>> operation. >>>>>>>>>>>>>>> I think this can be quite powerful feature. >>>>>>>>>>>>>>> Wondering what others think about this? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>>> Levani >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Jul 18, 2019, at 1:20 AM, John Roesler < >> j...@confluent.io <mailto:j...@confluent.io> >>>> <mailto:j...@confluent.io <mailto:j...@confluent.io>> <mailto: >> j...@confluent.io <mailto:j...@confluent.io> <mailto: >>>> j...@confluent.io <mailto:j...@confluent.io>>>> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Yes, I believe that's what I had in mind. Again, not totally >>>> sure it >>>>>>>>>>>>>>>> makes sense, but I believe something similar is the >> rationale >>>> for >>>>>>>>>>>>>>>> having the partitioner option in Produced. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>> -John >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Wed, Jul 17, 2019 at 3:20 PM Levani Kokhreidze >>>>>>>>>>>>>>>> <levani.co...@gmail.com <mailto:levani.co...@gmail.com> >> <mailto:levani.co...@gmail.com <mailto:levani.co...@gmail.com>> >>>> <mailto:levani.co...@gmail.com <mailto:levani.co...@gmail.com> <mailto: >> levani.co...@gmail.com <mailto:levani.co...@gmail.com>>>> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hey John, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Oh that’s interesting use-case. >>>>>>>>>>>>>>>>> Do I understand this correctly, in your example I would >>>> first issue >>>>>>>>>>>>>>> repartition(Repartitioned) with proper partitioner that >>>> essentially >>>>>>>>>>>>> would >>>>>>>>>>>>>>> be the same as the topic I want to join with and then do the >>>>>>>>>>>>> KStream#join >>>>>>>>>>>>>>> with DSL? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>>>>> Levani >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Jul 17, 2019, at 11:11 PM, John Roesler < >>>> j...@confluent.io <mailto:j...@confluent.io> <mailto:j...@confluent.io >> <mailto:j...@confluent.io>> <mailto:j...@confluent.io <mailto: >> j...@confluent.io> >>>> <mailto:j...@confluent.io <mailto:j...@confluent.io>>>> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Hey, all, just to chime in, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I think it might be useful to have an option to specify >> the >>>>>>>>>>>>>>>>>> partitioner. The case I have in mind is that some data may >>>> get >>>>>>>>>>>>>>>>>> repartitioned and then joined with an input topic. If the >>>> right-side >>>>>>>>>>>>>>>>>> input topic uses a custom partitioning strategy, then the >>>>>>>>>>>>>>>>>> repartitioned stream also needs to be partitioned with the >>>> same >>>>>>>>>>>>>>>>>> strategy. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Does that make sense, or did I maybe miss something >>>> important? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>> -John >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Wed, Jul 17, 2019 at 2:48 PM Levani Kokhreidze >>>>>>>>>>>>>>>>>> <levani.co...@gmail.com <mailto:levani.co...@gmail.com> >> <mailto:levani.co...@gmail.com <mailto:levani.co...@gmail.com>> >>>> <mailto:levani.co...@gmail.com <mailto:levani.co...@gmail.com> <mailto: >> levani.co...@gmail.com <mailto:levani.co...@gmail.com>>>> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Yes, I was thinking about it as well. To be honest I’m >> not >>>> sure >>>>>>>>>>>>> about >>>>>>>>>>>>>>> it yet. >>>>>>>>>>>>>>>>>>> As Kafka Streams DSL user, I don’t really think I would >>>> need control >>>>>>>>>>>>>>> over partitioner for internal topics. >>>>>>>>>>>>>>>>>>> As a user, I would assume that Kafka Streams knows best >>>> how to >>>>>>>>>>>>>>> partition data for internal topics. >>>>>>>>>>>>>>>>>>> In this KIP I wrote that Produced should be used only for >>>> topics >>>>>>>>>>>>> that >>>>>>>>>>>>>>> are created by user In advance. >>>>>>>>>>>>>>>>>>> In those cases maybe it make sense to have possibility to >>>> specify >>>>>>>>>>>>> the >>>>>>>>>>>>>>> partitioner. >>>>>>>>>>>>>>>>>>> I don’t have clear answer on that yet, but I guess >>>> specifying the >>>>>>>>>>>>>>> partitioner can be added as well if there’s agreement on >> this. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>>>>>>> Levani >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On Jul 17, 2019, at 10:42 PM, Sophie Blee-Goldman < >>>>>>>>>>>>>>> sop...@confluent.io <mailto:sop...@confluent.io> <mailto: >> sop...@confluent.io <mailto:sop...@confluent.io>> <mailto: >>>> sop...@confluent.io <mailto:sop...@confluent.io>>> wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Thanks for clearing that up. I agree that Repartitioned >>>> would be a >>>>>>>>>>>>>>> useful >>>>>>>>>>>>>>>>>>>> addition. I'm wondering if it might also need to have >>>>>>>>>>>>>>>>>>>> a withStreamPartitioner method/field, similar to >>>> Produced? I'm not >>>>>>>>>>>>>>> sure how >>>>>>>>>>>>>>>>>>>> widely this feature is really used, but seems it should >> be >>>>>>>>>>>>> available >>>>>>>>>>>>>>> for >>>>>>>>>>>>>>>>>>>> repartition topics. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On Wed, Jul 17, 2019 at 11:26 AM Levani Kokhreidze < >>>>>>>>>>>>>>> levani.co...@gmail.com <mailto:levani.co...@gmail.com> >>>> <mailto:levani.co...@gmail.com <mailto:levani.co...@gmail.com> <mailto: >> levani.co...@gmail.com <mailto:levani.co...@gmail.com>>>> >>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Hey Sophie, >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> In both cases KStream#repartition and >>>>>>>>>>>>>>> KStream#repartition(Repartitioned) >>>>>>>>>>>>>>>>>>>>> topic will be created and managed by Kafka Streams. >>>>>>>>>>>>>>>>>>>>> Idea of Repartitioned is to give user more control over >>>> the topic >>>>>>>>>>>>>>> such as >>>>>>>>>>>>>>>>>>>>> num of partitions. >>>>>>>>>>>>>>>>>>>>> I feel like Repartitioned parameter is something that >> is >>>> missing >>>>>>>>>>>>> in >>>>>>>>>>>>>>>>>>>>> current DSL design. >>>>>>>>>>>>>>>>>>>>> Essentially giving user control over parallelism by >>>> configuring >>>>>>>>>>>>> num >>>>>>>>>>>>>>> of >>>>>>>>>>>>>>>>>>>>> partitions for internal topics. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Hope this answers your question. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>>>>>>>>> Levani >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> On Jul 17, 2019, at 9:02 PM, Sophie Blee-Goldman < >>>>>>>>>>>>>>> sop...@confluent.io <mailto:sop...@confluent.io> <mailto: >> sop...@confluent.io <mailto:sop...@confluent.io>> <mailto: >>>> sop...@confluent.io <mailto:sop...@confluent.io>>> >>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Hey Levani, >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Thanks for the KIP! Can you clarify one thing for me >> -- >>>> for the >>>>>>>>>>>>>>>>>>>>>> KStream#repartition signature taking a Repartitioned, >>>> will the >>>>>>>>>>>>>>> topic be >>>>>>>>>>>>>>>>>>>>>> auto-created by Streams (which seems to be the case >> for >>>> the >>>>>>>>>>>>>>> signature >>>>>>>>>>>>>>>>>>>>>> without a Repartitioned) or does it have to be >>>> pre-created? The >>>>>>>>>>>>>>> wording >>>>>>>>>>>>>>>>>>>>> in >>>>>>>>>>>>>>>>>>>>>> the KIP makes it seem like one version of the method >>>> will >>>>>>>>>>>>>>> auto-create >>>>>>>>>>>>>>>>>>>>>> topics while the other will not. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>>>>>>>>> Sophie >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> On Wed, Jul 17, 2019 at 10:15 AM Levani Kokhreidze < >>>>>>>>>>>>>>>>>>>>> levani.co...@gmail.com <mailto:levani.co...@gmail.com> >>>> <mailto:levani.co...@gmail.com <mailto:levani.co...@gmail.com> <mailto: >> levani.co...@gmail.com <mailto:levani.co...@gmail.com>>>> >>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> One more bump about KIP-221 ( >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint >> < >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint >>> >>>> < >>>> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint >> < >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint >>>> >>>> < >>>> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint >> < >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint >>> >>>> < >>>> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint >> < >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint >>> >>>>>> >>>>>>>>>>>>>>>>>>>>>>> < >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint >> < >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint >>> >>>> < >>>> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint >>>>> >>>>>>>>>>>>>>>>>>>>>> ) >>>>>>>>>>>>>>>>>>>>>>> so it doesn’t get lost in mailing list :) >>>>>>>>>>>>>>>>>>>>>>> Would love to hear communities opinions/concerns >> about >>>> this KIP. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>>>>>>>>>>> Levani >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> On Jul 12, 2019, at 5:27 PM, Levani Kokhreidze < >>>>>>>>>>>>>>> levani.co...@gmail.com >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Kind reminder about this KIP: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint >>>>>>>>>>>>>>>>>>>>>>> < >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>>>>>>>>>>>> Levani >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> On Jul 9, 2019, at 11:38 AM, Levani Kokhreidze < >>>>>>>>>>>>>>>>>>>>> levani.co...@gmail.com >>>>>>>>>>>>>>>>>>>>>>> <mailto:levani.co...@gmail.com>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> In order to move this KIP forward, I’ve updated >>>> confluence >>>>>>>>>>>>> page >>>>>>>>>>>>>>> with >>>>>>>>>>>>>>>>>>>>>>> the new proposal >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint >>>>>>>>>>>>>>>>>>>>>>> < >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> I’ve also filled “Rejected Alternatives” section. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Looking forward to discuss this KIP :) >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> King regards, >>>>>>>>>>>>>>>>>>>>>>>>> Levani >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> On Jul 3, 2019, at 1:08 PM, Levani Kokhreidze < >>>>>>>>>>>>>>>>>>>>> levani.co...@gmail.com >>>>>>>>>>>>>>>>>>>>>>> <mailto:levani.co...@gmail.com>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Hello Matthias, >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for the feedback and ideas. >>>>>>>>>>>>>>>>>>>>>>>>>> I like the idea of introducing dedicated `Topic` >>>> class for >>>>>>>>>>>>>>> topic >>>>>>>>>>>>>>>>>>>>>>> configuration for internal operators like >> `groupedBy`. >>>>>>>>>>>>>>>>>>>>>>>>>> Would be great to hear others opinion about this >> as >>>> well. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Kind regards, >>>>>>>>>>>>>>>>>>>>>>>>>> Levani >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> On Jul 3, 2019, at 7:00 AM, Matthias J. Sax < >>>>>>>>>>>>>>> matth...@confluent.io >>>>>>>>>>>>>>>>>>>>>>> <mailto:matth...@confluent.io>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Levani, >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for picking up this KIP! And thanks for >>>> summarizing >>>>>>>>>>>>>>>>>>>>> everything. >>>>>>>>>>>>>>>>>>>>>>>>>>> Even if some points may have been discussed >>>> already (can't >>>>>>>>>>>>>>> really >>>>>>>>>>>>>>>>>>>>>>>>>>> remember), it's helpful to get a good summary to >>>> refresh the >>>>>>>>>>>>>>>>>>>>>>> discussion. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> I think your reasoning makes sense. With regard >> to >>>> the >>>>>>>>>>>>>>> distinction >>>>>>>>>>>>>>>>>>>>>>>>>>> between operators that manage topics and >> operators >>>> that use >>>>>>>>>>>>>>>>>>>>>>> user-created >>>>>>>>>>>>>>>>>>>>>>>>>>> topics: Following this argument, it might >> indicate >>>> that >>>>>>>>>>>>>>> leaving >>>>>>>>>>>>>>>>>>>>>>>>>>> `through()` as-is (as an operator that uses >>>> use-defined >>>>>>>>>>>>>>> topics) and >>>>>>>>>>>>>>>>>>>>>>>>>>> introducing a new `repartition()` operator (an >>>> operator that >>>>>>>>>>>>>>> manages >>>>>>>>>>>>>>>>>>>>>>>>>>> topics itself) might be good. Otherwise, there is >>>> one >>>>>>>>>>>>> operator >>>>>>>>>>>>>>>>>>>>>>>>>>> `through()` that sometimes manages topics but >>>> sometimes >>>>>>>>>>>>> not; a >>>>>>>>>>>>>>>>>>>>>>> different >>>>>>>>>>>>>>>>>>>>>>>>>>> name, ie, new operator would make the distinction >>>> clearer. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> About adding `numOfPartitions` to `Grouped`. I am >>>> wondering >>>>>>>>>>>>>>> if the >>>>>>>>>>>>>>>>>>>>>>> same >>>>>>>>>>>>>>>>>>>>>>>>>>> argument as for `Produced` does apply and adding >>>> it is >>>>>>>>>>>>>>> semantically >>>>>>>>>>>>>>>>>>>>>>>>>>> questionable? Might be good to get opinions of >>>> others on >>>>>>>>>>>>>>> this, too. >>>>>>>>>>>>>>>>>>>>> I >>>>>>>>>>>>>>>>>>>>>>> am >>>>>>>>>>>>>>>>>>>>>>>>>>> not sure myself what solution I prefer atm. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> So far, KS uses configuration objects that allow >> to >>>>>>>>>>>>> configure >>>>>>>>>>>>>>> a >>>>>>>>>>>>>>>>>>>>>>> certain >>>>>>>>>>>>>>>>>>>>>>>>>>> "entity" like a consumer, producer, store. If we >>>> assume that >>>>>>>>>>>>>>> a topic >>>>>>>>>>>>>>>>>>>>>>> is >>>>>>>>>>>>>>>>>>>>>>>>>>> a similar entity, I am wonder if we should have a >>>>>>>>>>>>>>>>>>>>>>>>>>> `Topic#withNumberOfPartitions()` class and method >>>> instead of >>>>>>>>>>>>>>> a plain >>>>>>>>>>>>>>>>>>>>>>>>>>> integer? This would allow us to add other >> configs, >>>> like >>>>>>>>>>>>>>> replication >>>>>>>>>>>>>>>>>>>>>>>>>>> factor, retention-time etc, easily, without the >>>> need to >>>>>>>>>>>>>>> change the >>>>>>>>>>>>>>>>>>>>>>> "main >>>>>>>>>>>>>>>>>>>>>>>>>>> API". >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Just want to give some ideas. Not sure if I like >>>> them >>>>>>>>>>>>> myself. >>>>>>>>>>>>>>> :) >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> -Matthias >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> On 7/1/19 1:04 AM, Levani Kokhreidze wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>> Actually, giving it more though - maybe >> enhancing >>>> Produced >>>>>>>>>>>>>>> with num >>>>>>>>>>>>>>>>>>>>>>> of partitions configuration is not the best approach. >>>> Let me >>>>>>>>>>>>>>> explain >>>>>>>>>>>>>>>>>>>>> why: >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> 1) If we enhance Produced class with this >>>> configuration, >>>>>>>>>>>>>>> this will >>>>>>>>>>>>>>>>>>>>>>> also affect KStream#to operation. Since KStream#to is >>>> the final >>>>>>>>>>>>>>> sink of >>>>>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>>>> topology, for me, it seems to be reasonable >> assumption >>>> that user >>>>>>>>>>>>>>> needs >>>>>>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>>>>>> manually create sink topic in advance. And in that >>>> case, having >>>>>>>>>>>>>>> num of >>>>>>>>>>>>>>>>>>>>>>> partitions configuration doesn’t make much sense. >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> 2) Looking at Produced class, based on API >>>> contract, seems >>>>>>>>>>>>>>> like >>>>>>>>>>>>>>>>>>>>>>> Produced is designed to be something that is >>>> explicitly for >>>>>>>>>>>>>>> producer >>>>>>>>>>>>>>>>>>>>> (key >>>>>>>>>>>>>>>>>>>>>>> serializer, value serializer, partitioner those all >>>> are producer >>>>>>>>>>>>>>>>>>>>> specific >>>>>>>>>>>>>>>>>>>>>>> configurations) and num of partitions is topic level >>>>>>>>>>>>>>> configuration. And >>>>>>>>>>>>>>>>>>>>> I >>>>>>>>>>>>>>>>>>>>>>> don’t think mixing topic and producer level >>>> configurations >>>>>>>>>>>>>>> together in >>>>>>>>>>>>>>>>>>>>> one >>>>>>>>>>>>>>>>>>>>>>> class is the good approach. >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> 3) Looking at KStream interface, seems like >>>> Produced >>>>>>>>>>>>>>> parameter is >>>>>>>>>>>>>>>>>>>>>>> for operations that work with non-internal (e.g >> topics >>>> created >>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>>>>> managed >>>>>>>>>>>>>>>>>>>>>>> internally by Kafka Streams) topics and I think we >>>> should leave >>>>>>>>>>>>>>> it as >>>>>>>>>>>>>>>>>>>>> it is >>>>>>>>>>>>>>>>>>>>>>> in that case. >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Taking all this things into account, I think we >>>> should >>>>>>>>>>>>>>> distinguish >>>>>>>>>>>>>>>>>>>>>>> between DSL operations, where Kafka Streams should >>>> create and >>>>>>>>>>>>>>> manage >>>>>>>>>>>>>>>>>>>>>>> internal topics (KStream#groupBy) vs topics that >>>> should be >>>>>>>>>>>>>>> created in >>>>>>>>>>>>>>>>>>>>>>> advance (e.g KStream#to). >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> To sum it up, I think adding numPartitions >>>> configuration in >>>>>>>>>>>>>>>>>>>>> Produced >>>>>>>>>>>>>>>>>>>>>>> will result in mixing topic and producer level >>>> configuration in >>>>>>>>>>>>>>> one >>>>>>>>>>>>>>>>>>>>> class >>>>>>>>>>>>>>>>>>>>>>> and it’s gonna break existing API semantics. >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Regarding making topic name optional in >>>> KStream#through - I >>>>>>>>>>>>>>> think >>>>>>>>>>>>>>>>>>>>>>> underline idea is very useful and giving users >>>> possibility to >>>>>>>>>>>>>>> specify >>>>>>>>>>>>>>>>>>>>> num >>>>>>>>>>>>>>>>>>>>>>> of partitions there is even more useful :) >> Considering >>>> arguments >>>>>>>>>>>>>>> against >>>>>>>>>>>>>>>>>>>>>>> adding num of partitions in Produced class, I see two >>>> options >>>>>>>>>>>>>>> here: >>>>>>>>>>>>>>>>>>>>>>>>>>>> 1) Add following method overloads >>>>>>>>>>>>>>>>>>>>>>>>>>>> * through() - topic will be auto-generated and >>>> num of >>>>>>>>>>>>>>> partitions >>>>>>>>>>>>>>>>>>>>>>> will be taken from source topic >>>>>>>>>>>>>>>>>>>>>>>>>>>> * through(final int numOfPartitions) - topic >> will >>>> be auto >>>>>>>>>>>>>>>>>>>>>>> generated with specified num of partitions >>>>>>>>>>>>>>>>>>>>>>>>>>>> * through(final int numOfPartitions, final >>>> Produced<K, V> >>>>>>>>>>>>>>>>>>>>>>> produced) - topic will be with generated with >>>> specified num of >>>>>>>>>>>>>>>>>>>>> partitions >>>>>>>>>>>>>>>>>>>>>>> and configuration taken from produced parameter. >>>>>>>>>>>>>>>>>>>>>>>>>>>> 2) Leave KStream#through as it is and introduce >>>> new method >>>>>>>>>>>>> - >>>>>>>>>>>>>>>>>>>>>>> KStream#repartition (I think Matthias suggested this >>>> in one of >>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>> threads) >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Considering all mentioned above I propose the >>>> following >>>>>>>>>>>>> plan: >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Option A: >>>>>>>>>>>>>>>>>>>>>>>>>>>> 1) Leave Produced as it is >>>>>>>>>>>>>>>>>>>>>>>>>>>> 2) Add num of partitions configuration to >> Grouped >>>> class (as >>>>>>>>>>>>>>>>>>>>>>> mentioned in the KIP) >>>>>>>>>>>>>>>>>>>>>>>>>>>> 3) Add following method overloads to >>>> KStream#through >>>>>>>>>>>>>>>>>>>>>>>>>>>> * through() - topic will be auto-generated and >>>> num of >>>>>>>>>>>>>>> partitions >>>>>>>>>>>>>>>>>>>>>>> will be taken from source topic >>>>>>>>>>>>>>>>>>>>>>>>>>>> * through(final int numOfPartitions) - topic >> will >>>> be auto >>>>>>>>>>>>>>>>>>>>>>> generated with specified num of partitions >>>>>>>>>>>>>>>>>>>>>>>>>>>> * through(final int numOfPartitions, final >>>> Produced<K, V> >>>>>>>>>>>>>>>>>>>>>>> produced) - topic will be with generated with >>>> specified num of >>>>>>>>>>>>>>>>>>>>> partitions >>>>>>>>>>>>>>>>>>>>>>> and configuration taken from produced parameter. >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Option B: >>>>>>>>>>>>>>>>>>>>>>>>>>>> 1) Leave Produced as it is >>>>>>>>>>>>>>>>>>>>>>>>>>>> 2) Add num of partitions configuration to >> Grouped >>>> class (as >>>>>>>>>>>>>>>>>>>>>>> mentioned in the KIP) >>>>>>>>>>>>>>>>>>>>>>>>>>>> 3) Add new operator KStream#repartition for >>>> creating and >>>>>>>>>>>>>>> managing >>>>>>>>>>>>>>>>>>>>>>> internal repartition topics >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> P.S. I’m sorry if all of this was already >>>> discussed in the >>>>>>>>>>>>>>> mailing >>>>>>>>>>>>>>>>>>>>>>> list, but I kinda got with all the threads that were >>>> about this >>>>>>>>>>>>>>> KIP :( >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Kind regards, >>>>>>>>>>>>>>>>>>>>>>>>>>>> Levani >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Jul 1, 2019, at 9:56 AM, Levani Kokhreidze < >>>>>>>>>>>>>>>>>>>>>>> levani.co...@gmail.com <mailto: >> levani.co...@gmail.com>> >>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> I would like to resurrect discussion around >>>> KIP-221. Going >>>>>>>>>>>>>>> through >>>>>>>>>>>>>>>>>>>>>>> the discussion thread, there’s seems to agreement >>>> around >>>>>>>>>>>>>>> usefulness of >>>>>>>>>>>>>>>>>>>>> this >>>>>>>>>>>>>>>>>>>>>>> feature. >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Regarding the implementation, as far as I >>>> understood, the >>>>>>>>>>>>>>> most >>>>>>>>>>>>>>>>>>>>>>> optimal solution for me seems the following: >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1) Add two method overloads to KStream#through >>>> method >>>>>>>>>>>>>>> (essentially >>>>>>>>>>>>>>>>>>>>>>> making topic name optional) >>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2) Enhance Produced class with numOfPartitions >>>>>>>>>>>>> configuration >>>>>>>>>>>>>>>>>>>>> field. >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Those two changes will allow DSL users to >> control >>>>>>>>>>>>>>> parallelism and >>>>>>>>>>>>>>>>>>>>>>> trigger re-partition without doing stateful >> operations. >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> I will update KIP with interface changes around >>>>>>>>>>>>>>> KStream#through if >>>>>>>>>>>>>>>>>>>>>>> this changes sound sensible. >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Kind regards, >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Levani >> >>