Sorry for late reply. I guess, the question boils down to the intended semantics of `repartition()`. My understanding is as follows:
- KS does auto-repartitioning for correctness reasons (using the upstream topic to determine the number of partitions) - KS does auto-repartitioning only for downstream DSL operators like `count()` (eg, a `transform()` does never trigger an auto-repartitioning even if the stream is marked as `repartitioningRequired`). - KS offers `through()` to enforce a repartitioning -- however, the user needs to create the topic manually (with the desired number of partitions). I see two main applications for `repartitioning()`: 1) repartition data before a `transform()` but user does not want to manage the topic 2) scale out a downstream subtopology Hence, I see `repartition()` similar to `through()`: if a users calls it, a repartitining is enforced, with the difference that KS manages the topic and the user does not need to create it. This behavior makes (1) and (2) possible. > I think many users would prefer to just say "if there *is* a repartition > required at this point in the topology, it should > have N partitions" Because of (2), I disagree. Either a user does not care about scaling out, for which case she would not specify the number of partitions. Or a user does care, and hence wants to enforce the scale out. I don't think that any user would say, "maybe scale out". Therefore, the optimizer should never ignore the repartition operation. As a "consequence" (because repartitioning is expensive) a user should make an explicit call to `repartition()` IMHO -- piggybacking an enforced repartitioning into `groupByKey()` seems to be "dangerous" because it might be too subtle and an "optional scaling out" as laid out above does not make sense IMHO. I am also not worried about "over repartitioning" because the result stream would never trigger auto-repartitioning. Only if multiple consecutive calls to `repartition()` are made it could be bad -- but that's the same with `through()`. In the end, there is always some responsibility on the user. Btw, for `.groupBy()` we know that repartitioning will be required, however, for `groupByKey()` it depends if the KStream is marked as `repartitioningRequired`. Hence, for `groupByKey()` it should not be possible for a user to set number of partitions IMHO. For `groupBy()` it's a different story, because calling `repartition().groupBy()` does not achieve what we want. Hence, allowing users to pass in the number of users partitions into `groupBy()` does actually makes sense, because repartitioning will happen anyway and thus we can piggyback a scaling decision. I think that John has a fair concern about the overloads, however, I am not convinced that using `Grouped` to specify the number of partitions is intuitive. I double checked `Grouped` and `Repartitioned` and both allow to specify a `name` and `keySerde/valueSerde`. Thus, I am wondering if we could bridge the gap between both, if we would make `Repartitioned extends Grouped`? For this case, we only need `groupBy(Grouped)` and a user can pass in both types what seems to make the API quite smooth: `stream.groupBy(..., Grouped...)` `stream.groupBy(..., Repartitioned...)` Thoughts? -Matthias On 11/7/19 10:59 AM, Levani Kokhreidze wrote: > Hi Sophie, > > Thank you for your reply, very insightful. Looking forward hearing others > opinion as well on this. > > Kind regards, > Levani > > >> On Nov 6, 2019, at 1:30 AM, Sophie Blee-Goldman <sop...@confluent.io> wrote: >> >>> Personally, I think Matthias’s concern is valid, but on the other hand >> Kafka Streams has already >>> optimizer in place which alters topology independently from user >> >> I agree (with you) and think this is a good way to put it -- we currently >> auto-repartition for the user so >> that they don't have to walk through their entire topology and reason about >> when and where to place a >> `.through` (or the new `.repartition`), so why suddenly force this onto the >> user? How certain are we that >> users will always get this right? It's easy to imagine that during >> development, you write your new app with >> correctly placed repartitions in order to use this new feature. During the >> course of development you end up >> tweaking the topology, but don't remember to review or move the >> repartitioning since you're used to Streams >> doing this for you. If you use only single-partition topics for testing, >> you might not even notice your app is >> spitting out incorrect results! >> >> Anyways, I feel pretty strongly that it would be weird to introduce a new >> feature and say that to use it, you can't take >> advantage of this other feature anymore. Also, is it possible our >> optimization framework could ever include an >> optimized repartitioning strategy that is better than what a user could >> achieve by manually inserting repartitions? >> Do we expect users to have a deep understanding of the best way to >> repartition their particular topology, or is it >> likely they will end up over-repartitioning either due to missed >> optimizations or unnecessary extra repartitions? >> I think many users would prefer to just say "if there *is* a repartition >> required at this point in the topology, it should >> have N partitions" >> >> As to the idea of adding `numberOfPartitions` to Grouped rather than >> adding a new parameter to groupBy, that does seem more in line with the >> current syntax so +1 from me >> >> On Tue, Nov 5, 2019 at 2:07 PM Levani Kokhreidze <levani.co...@gmail.com> >> wrote: >> >>> Hello all, >>> >>> While https://github.com/apache/kafka/pull/7170 < >>> https://github.com/apache/kafka/pull/7170> is under review and it’s >>> almost done, I want to resurrect discussion about this KIP to address >>> couple of concerns raised by Matthias and John. >>> >>> As a reminder, idea of the KIP-221 was to allow DSL users control over >>> repartitioning and parallelism of sub-topologies by: >>> 1) Introducing new KStream#repartition operation which is done in >>> https://github.com/apache/kafka/pull/7170 < >>> https://github.com/apache/kafka/pull/7170> >>> 2) Add new KStream#groupBy(Repartitioned) operation, which is planned to >>> be separate PR. >>> >>> While all agree about general implementation and idea behind >>> https://github.com/apache/kafka/pull/7170 < >>> https://github.com/apache/kafka/pull/7170> PR, introducing new >>> KStream#groupBy(Repartitioned) method overload raised some questions during >>> the review. >>> Matthias raised concern that there can be cases when user uses >>> `KStream#groupBy(Repartitioned)` operation, but actual repartitioning may >>> not required, thus configuration passed via `Repartitioned` would never be >>> applied (Matthias, please correct me if I misinterpreted your comment). >>> So instead, if user wants to control parallelism of sub-topologies, he or >>> she should always use `KStream#repartition` operation before groupBy. Full >>> comment can be seen here: >>> https://github.com/apache/kafka/pull/7170#issuecomment-519303125 < >>> https://github.com/apache/kafka/pull/7170#issuecomment-519303125> >>> >>> On the same topic, John pointed out that, from API design perspective, we >>> shouldn’t intertwine configuration classes of different operators between >>> one another. So instead of introducing new `KStream#groupBy(Repartitioned)` >>> for specifying number of partitions for internal topic, we should update >>> existing `Grouped` class with `numberOfPartitions` field. >>> >>> Personally, I think Matthias’s concern is valid, but on the other hand >>> Kafka Streams has already optimizer in place which alters topology >>> independently from user. So maybe it makes sense if Kafka Streams, >>> internally would optimize topology in the best way possible, even if in >>> some cases this means ignoring some operator configurations passed by the >>> user. Also, I agree with John about API design semantics. If we go through >>> with the changes for `KStream#groupBy` operation, it makes more sense to >>> add `numberOfPartitions` field to `Grouped` class instead of introducing >>> new `KStream#groupBy(Repartitioned)` method overload. >>> >>> I would really appreciate communities feedback on this. >>> >>> Kind regards, >>> Levani >>> >>> >>> >>>> On Oct 17, 2019, at 12:57 AM, Sophie Blee-Goldman <sop...@confluent.io> >>> wrote: >>>> >>>> Hey Levani, >>>> >>>> I think people are busy with the upcoming 2.4 release, and don't have >>> much >>>> spare time at the >>>> moment. It's kind of a difficult time to get attention on things, but >>> feel >>>> free to pick up something else >>>> to work on in the meantime until things have calmed down a bit! >>>> >>>> Cheers, >>>> Sophie >>>> >>>> >>>> On Wed, Oct 16, 2019 at 11:26 AM Levani Kokhreidze < >>> levani.co...@gmail.com <mailto:levani.co...@gmail.com>> >>>> wrote: >>>> >>>>> Hello all, >>>>> >>>>> Sorry for bringing this thread again, but I would like to get some >>>>> attention on this PR: https://github.com/apache/kafka/pull/7170 < >>> https://github.com/apache/kafka/pull/7170> < >>>>> https://github.com/apache/kafka/pull/7170 < >>> https://github.com/apache/kafka/pull/7170>> >>>>> It's been a while now and I would love to move on to other KIPs as well. >>>>> Please let me know if you have any concerns. >>>>> >>>>> Regards, >>>>> Levani >>>>> >>>>> >>>>>> On Jul 26, 2019, at 11:25 AM, Levani Kokhreidze < >>> levani.co...@gmail.com <mailto:levani.co...@gmail.com>> >>>>> wrote: >>>>>> >>>>>> Hi all, >>>>>> >>>>>> Here’s voting thread for this KIP: >>>>> https://www.mail-archive.com/dev@kafka.apache.org/msg99680.html < >>> https://www.mail-archive.com/dev@kafka.apache.org/msg99680.html> < >>>>> https://www.mail-archive.com/dev@kafka.apache.org/msg99680.html < >>> https://www.mail-archive.com/dev@kafka.apache.org/msg99680.html>> >>>>>> >>>>>> Regards, >>>>>> Levani >>>>>> >>>>>>> On Jul 24, 2019, at 11:15 PM, Levani Kokhreidze < >>> levani.co...@gmail.com <mailto:levani.co...@gmail.com> >>>>> <mailto:levani.co...@gmail.com <mailto:levani.co...@gmail.com>>> wrote: >>>>>>> >>>>>>> Hi Matthias, >>>>>>> >>>>>>> Thanks for the suggestion. I Don’t have strong opinion on that one. >>>>>>> Agree that avoiding unnecessary method overloads is a good idea. >>>>>>> >>>>>>> Updated KIP >>>>>>> >>>>>>> Regards, >>>>>>> Levani >>>>>>> >>>>>>> >>>>>>>> On Jul 24, 2019, at 8:50 PM, Matthias J. Sax <matth...@confluent.io >>> <mailto:matth...@confluent.io> >>>>> <mailto:matth...@confluent.io <mailto:matth...@confluent.io>>> wrote: >>>>>>>> >>>>>>>> One question: >>>>>>>> >>>>>>>> Why do we add >>>>>>>> >>>>>>>>> Repartitioned#with(final String name, final int numberOfPartitions) >>>>>>>> >>>>>>>> It seems that `#with(String name)`, `#numberOfPartitions(int)` in >>>>>>>> combination with `withName()` and `withNumberOfPartitions()` should >>> be >>>>>>>> sufficient. Users can chain the method calls. >>>>>>>> >>>>>>>> (I think it's valuable to keep the number of overload small if >>>>> possible.) >>>>>>>> >>>>>>>> Otherwise LGTM. >>>>>>>> >>>>>>>> >>>>>>>> -Matthias >>>>>>>> >>>>>>>> >>>>>>>> On 7/23/19 2:18 PM, Levani Kokhreidze wrote: >>>>>>>>> Hello, >>>>>>>>> >>>>>>>>> Thanks all for your feedback. >>>>>>>>> I started voting procedure for this KIP. If there’re any other >>>>> concerns about this KIP, please let me know. >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Levani >>>>>>>>> >>>>>>>>>> On Jul 20, 2019, at 8:39 PM, Levani Kokhreidze < >>>>> levani.co...@gmail.com <mailto:levani.co...@gmail.com> <mailto: >>> levani.co...@gmail.com <mailto:levani.co...@gmail.com>>> wrote: >>>>>>>>>> >>>>>>>>>> Hi Matthias, >>>>>>>>>> >>>>>>>>>> Thanks for the suggestion, makes sense. >>>>>>>>>> I’ve updated KIP ( >>>>> >>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint >>> < >>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint >>>> >>>>> < >>>>> >>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint >>> < >>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint >>>>> >>>>> < >>>>> >>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint >>> < >>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint >>>> >>>>> < >>>>> >>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint >>> < >>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint >>>> >>>>>>> ). >>>>>>>>>> >>>>>>>>>> Regards, >>>>>>>>>> Levani >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> On Jul 20, 2019, at 3:53 AM, Matthias J. Sax < >>> matth...@confluent.io <mailto:matth...@confluent.io> >>>>> <mailto:matth...@confluent.io <mailto:matth...@confluent.io>> <mailto: >>> matth...@confluent.io <mailto:matth...@confluent.io> <mailto: >>>>> matth...@confluent.io <mailto:matth...@confluent.io>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> Thanks for driving the KIP. >>>>>>>>>>> >>>>>>>>>>> I agree that users need to be able to specify a partitioning >>>>> strategy. >>>>>>>>>>> >>>>>>>>>>> Sophie raises a fair point about topic configs and producer >>>>> configs. My >>>>>>>>>>> take is, that consider `Repartitioned` as an "extension" to >>>>> `Produced`, >>>>>>>>>>> that adds topic configuration, is a good way to think about it and >>>>> helps >>>>>>>>>>> to keep the API "clean". >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> With regard to method names. I would prefer to avoid >>> abbreviations. >>>>> Can >>>>>>>>>>> we rename: >>>>>>>>>>> >>>>>>>>>>> `withNumOfPartitions` -> `withNumberOfPartitions` >>>>>>>>>>> >>>>>>>>>>> Furthermore, it might be good to add some more `static` methods: >>>>>>>>>>> >>>>>>>>>>> - Repartitioned.with(Serde<K>, Serde<V>) >>>>>>>>>>> - Repartitioned.withNumberOfPartitions(int) >>>>>>>>>>> - Repartitioned.streamPartitioner(StreamPartitioner) >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -Matthias >>>>>>>>>>> >>>>>>>>>>> On 7/19/19 3:33 PM, Levani Kokhreidze wrote: >>>>>>>>>>>> Totally agree. I think in KStream interface it makes sense to >>> have >>>>> some duplicate configurations between operators in order to keep API >>> simple >>>>> and usable. >>>>>>>>>>>> Also, as more surface API has, harder it is to have proper >>>>> backward compatibility. >>>>>>>>>>>> While initial idea of keeping topic level configs separate was >>>>> exciting, having Repartitioned class encapsulate some producer level >>>>> configs makes API more readable. >>>>>>>>>>>> >>>>>>>>>>>> Regards, >>>>>>>>>>>> Levani >>>>>>>>>>>> >>>>>>>>>>>>> On Jul 20, 2019, at 1:15 AM, Sophie Blee-Goldman < >>>>> sop...@confluent.io <mailto:sop...@confluent.io> <mailto: >>> sop...@confluent.io <mailto:sop...@confluent.io>> <mailto: >>>>> sop...@confluent.io <mailto:sop...@confluent.io> <mailto: >>> sop...@confluent.io <mailto:sop...@confluent.io>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> I think that is a good point about trying to keep producer level >>>>>>>>>>>>> configurations and (repartition) topic level considerations >>>>> separate. >>>>>>>>>>>>> Number of partitions is definitely purely a topic level >>>>> configuration. But >>>>>>>>>>>>> on some level, serdes and partitioners are just as much a topic >>>>>>>>>>>>> configuration as a producer one. You could have two producers >>>>> configured >>>>>>>>>>>>> with different serdes and/or partitioners, but if they are >>>>> writing to the >>>>>>>>>>>>> same topic the result would be very difficult to part. So in a >>>>> sense, these >>>>>>>>>>>>> are configurations of topics in Streams, not just producers. >>>>>>>>>>>>> >>>>>>>>>>>>> Another way to think of it: while the Streams API is not always >>>>> true to >>>>>>>>>>>>> this, ideally all the relevant configs for an operator are >>>>> wrapped into a >>>>>>>>>>>>> single object (in this case, Repartitioned). We could instead >>>>> split out the >>>>>>>>>>>>> fields in common with Produced into a separate parameter to keep >>>>> topic and >>>>>>>>>>>>> producer level configurations separate, but this increases the >>>>> API surface >>>>>>>>>>>>> area by a lot. It's much more straightforward to just say "this >>> is >>>>>>>>>>>>> everything that this particular operator needs" without worrying >>>>> about what >>>>>>>>>>>>> exactly you're specifying. >>>>>>>>>>>>> >>>>>>>>>>>>> I suppose you could alternatively make Produced a field of >>>>> Repartitioned, >>>>>>>>>>>>> but I don't think we do this kind of composition elsewhere in >>>>> Streams at >>>>>>>>>>>>> the moment >>>>>>>>>>>>> >>>>>>>>>>>>> On Fri, Jul 19, 2019 at 1:45 PM Levani Kokhreidze < >>>>> levani.co...@gmail.com <mailto:levani.co...@gmail.com> <mailto: >>> levani.co...@gmail.com <mailto:levani.co...@gmail.com>> <mailto: >>>>> levani.co...@gmail.com <mailto:levani.co...@gmail.com> <mailto: >>> levani.co...@gmail.com <mailto:levani.co...@gmail.com>>>> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hi Bill, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks a lot for the feedback. >>>>>>>>>>>>>> Yes, that makes sense. I’ve updated KIP with >>>>> `Repartitioned#partitioner` >>>>>>>>>>>>>> configuration. >>>>>>>>>>>>>> In the beginning, I wanted to introduce a class for topic level >>>>>>>>>>>>>> configuration and keep topic level and producer level >>>>> configurations (such >>>>>>>>>>>>>> as Produced) separately (see my second email in this thread). >>>>>>>>>>>>>> But while looking at the semantics of KStream interface, I >>>>> couldn’t really >>>>>>>>>>>>>> figure out good operation name for Topic level configuration >>>>> class and just >>>>>>>>>>>>>> introducing `Topic` config class was kinda breaking the >>>>> semantics. >>>>>>>>>>>>>> So I think having Repartitioned class which encapsulates topic >>>>> and >>>>>>>>>>>>>> producer level configurations for internal topics is viable >>>>> thing to do. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>> Levani >>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Jul 19, 2019, at 7:47 PM, Bill Bejeck <bbej...@gmail.com >>> <mailto:bbej...@gmail.com> >>>>> <mailto:bbej...@gmail.com <mailto:bbej...@gmail.com>> <mailto: >>> bbej...@gmail.com <mailto:bbej...@gmail.com> <mailto: >>>>> bbej...@gmail.com <mailto:bbej...@gmail.com>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi Lavani, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks for resurrecting this KIP. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I'm also a +1 for adding a partition option. In addition to >>>>> the reason >>>>>>>>>>>>>>> provided by John, my reasoning is: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 1. Users may want to use something other than hash-based >>>>> partitioning >>>>>>>>>>>>>>> 2. Users may wish to partition on something different than the >>>>> key >>>>>>>>>>>>>>> without having to change the key. For example: >>>>>>>>>>>>>>> 1. A combination of fields in the value in conjunction with >>>>> the key >>>>>>>>>>>>>>> 2. Something other than the key >>>>>>>>>>>>>>> 3. We allow users to specify a partitioner on Produced hence >>> in >>>>>>>>>>>>>>> KStream.to and KStream.through, so it makes sense for API >>>>> consistency. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Just my 2 cents. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>> Bill >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Fri, Jul 19, 2019 at 5:46 AM Levani Kokhreidze < >>>>>>>>>>>>>> levani.co...@gmail.com <mailto:levani.co...@gmail.com> >>> <mailto:levani.co...@gmail.com <mailto:levani.co...@gmail.com>> <mailto: >>>>> levani.co...@gmail.com <mailto:levani.co...@gmail.com> <mailto: >>> levani.co...@gmail.com <mailto:levani.co...@gmail.com>>>> >>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hi John, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> In my mind it makes sense. >>>>>>>>>>>>>>>> If we add partitioner configuration to Repartitioned class, >>>>> with the >>>>>>>>>>>>>>>> combination of specifying number of partitions for internal >>>>> topics, user >>>>>>>>>>>>>>>> will have opportunity to ensure co-partitioning before join >>>>> operation. >>>>>>>>>>>>>>>> I think this can be quite powerful feature. >>>>>>>>>>>>>>>> Wondering what others think about this? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>>>> Levani >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Jul 18, 2019, at 1:20 AM, John Roesler < >>> j...@confluent.io <mailto:j...@confluent.io> >>>>> <mailto:j...@confluent.io <mailto:j...@confluent.io>> <mailto: >>> j...@confluent.io <mailto:j...@confluent.io> <mailto: >>>>> j...@confluent.io <mailto:j...@confluent.io>>>> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Yes, I believe that's what I had in mind. Again, not totally >>>>> sure it >>>>>>>>>>>>>>>>> makes sense, but I believe something similar is the >>> rationale >>>>> for >>>>>>>>>>>>>>>>> having the partitioner option in Produced. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>> -John >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Wed, Jul 17, 2019 at 3:20 PM Levani Kokhreidze >>>>>>>>>>>>>>>>> <levani.co...@gmail.com <mailto:levani.co...@gmail.com> >>> <mailto:levani.co...@gmail.com <mailto:levani.co...@gmail.com>> >>>>> <mailto:levani.co...@gmail.com <mailto:levani.co...@gmail.com> <mailto: >>> levani.co...@gmail.com <mailto:levani.co...@gmail.com>>>> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Hey John, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Oh that’s interesting use-case. >>>>>>>>>>>>>>>>>> Do I understand this correctly, in your example I would >>>>> first issue >>>>>>>>>>>>>>>> repartition(Repartitioned) with proper partitioner that >>>>> essentially >>>>>>>>>>>>>> would >>>>>>>>>>>>>>>> be the same as the topic I want to join with and then do the >>>>>>>>>>>>>> KStream#join >>>>>>>>>>>>>>>> with DSL? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>>>>>> Levani >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Jul 17, 2019, at 11:11 PM, John Roesler < >>>>> j...@confluent.io <mailto:j...@confluent.io> <mailto:j...@confluent.io >>> <mailto:j...@confluent.io>> <mailto:j...@confluent.io <mailto: >>> j...@confluent.io> >>>>> <mailto:j...@confluent.io <mailto:j...@confluent.io>>>> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Hey, all, just to chime in, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I think it might be useful to have an option to specify >>> the >>>>>>>>>>>>>>>>>>> partitioner. The case I have in mind is that some data may >>>>> get >>>>>>>>>>>>>>>>>>> repartitioned and then joined with an input topic. If the >>>>> right-side >>>>>>>>>>>>>>>>>>> input topic uses a custom partitioning strategy, then the >>>>>>>>>>>>>>>>>>> repartitioned stream also needs to be partitioned with the >>>>> same >>>>>>>>>>>>>>>>>>> strategy. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Does that make sense, or did I maybe miss something >>>>> important? >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>> -John >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Wed, Jul 17, 2019 at 2:48 PM Levani Kokhreidze >>>>>>>>>>>>>>>>>>> <levani.co...@gmail.com <mailto:levani.co...@gmail.com> >>> <mailto:levani.co...@gmail.com <mailto:levani.co...@gmail.com>> >>>>> <mailto:levani.co...@gmail.com <mailto:levani.co...@gmail.com> <mailto: >>> levani.co...@gmail.com <mailto:levani.co...@gmail.com>>>> wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Yes, I was thinking about it as well. To be honest I’m >>> not >>>>> sure >>>>>>>>>>>>>> about >>>>>>>>>>>>>>>> it yet. >>>>>>>>>>>>>>>>>>>> As Kafka Streams DSL user, I don’t really think I would >>>>> need control >>>>>>>>>>>>>>>> over partitioner for internal topics. >>>>>>>>>>>>>>>>>>>> As a user, I would assume that Kafka Streams knows best >>>>> how to >>>>>>>>>>>>>>>> partition data for internal topics. >>>>>>>>>>>>>>>>>>>> In this KIP I wrote that Produced should be used only for >>>>> topics >>>>>>>>>>>>>> that >>>>>>>>>>>>>>>> are created by user In advance. >>>>>>>>>>>>>>>>>>>> In those cases maybe it make sense to have possibility to >>>>> specify >>>>>>>>>>>>>> the >>>>>>>>>>>>>>>> partitioner. >>>>>>>>>>>>>>>>>>>> I don’t have clear answer on that yet, but I guess >>>>> specifying the >>>>>>>>>>>>>>>> partitioner can be added as well if there’s agreement on >>> this. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>>>>>>>> Levani >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On Jul 17, 2019, at 10:42 PM, Sophie Blee-Goldman < >>>>>>>>>>>>>>>> sop...@confluent.io <mailto:sop...@confluent.io> <mailto: >>> sop...@confluent.io <mailto:sop...@confluent.io>> <mailto: >>>>> sop...@confluent.io <mailto:sop...@confluent.io>>> wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Thanks for clearing that up. I agree that Repartitioned >>>>> would be a >>>>>>>>>>>>>>>> useful >>>>>>>>>>>>>>>>>>>>> addition. I'm wondering if it might also need to have >>>>>>>>>>>>>>>>>>>>> a withStreamPartitioner method/field, similar to >>>>> Produced? I'm not >>>>>>>>>>>>>>>> sure how >>>>>>>>>>>>>>>>>>>>> widely this feature is really used, but seems it should >>> be >>>>>>>>>>>>>> available >>>>>>>>>>>>>>>> for >>>>>>>>>>>>>>>>>>>>> repartition topics. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On Wed, Jul 17, 2019 at 11:26 AM Levani Kokhreidze < >>>>>>>>>>>>>>>> levani.co...@gmail.com <mailto:levani.co...@gmail.com> >>>>> <mailto:levani.co...@gmail.com <mailto:levani.co...@gmail.com> <mailto: >>> levani.co...@gmail.com <mailto:levani.co...@gmail.com>>>> >>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Hey Sophie, >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> In both cases KStream#repartition and >>>>>>>>>>>>>>>> KStream#repartition(Repartitioned) >>>>>>>>>>>>>>>>>>>>>> topic will be created and managed by Kafka Streams. >>>>>>>>>>>>>>>>>>>>>> Idea of Repartitioned is to give user more control over >>>>> the topic >>>>>>>>>>>>>>>> such as >>>>>>>>>>>>>>>>>>>>>> num of partitions. >>>>>>>>>>>>>>>>>>>>>> I feel like Repartitioned parameter is something that >>> is >>>>> missing >>>>>>>>>>>>>> in >>>>>>>>>>>>>>>>>>>>>> current DSL design. >>>>>>>>>>>>>>>>>>>>>> Essentially giving user control over parallelism by >>>>> configuring >>>>>>>>>>>>>> num >>>>>>>>>>>>>>>> of >>>>>>>>>>>>>>>>>>>>>> partitions for internal topics. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Hope this answers your question. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>>>>>>>>>> Levani >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> On Jul 17, 2019, at 9:02 PM, Sophie Blee-Goldman < >>>>>>>>>>>>>>>> sop...@confluent.io <mailto:sop...@confluent.io> <mailto: >>> sop...@confluent.io <mailto:sop...@confluent.io>> <mailto: >>>>> sop...@confluent.io <mailto:sop...@confluent.io>>> >>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Hey Levani, >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Thanks for the KIP! Can you clarify one thing for me >>> -- >>>>> for the >>>>>>>>>>>>>>>>>>>>>>> KStream#repartition signature taking a Repartitioned, >>>>> will the >>>>>>>>>>>>>>>> topic be >>>>>>>>>>>>>>>>>>>>>>> auto-created by Streams (which seems to be the case >>> for >>>>> the >>>>>>>>>>>>>>>> signature >>>>>>>>>>>>>>>>>>>>>>> without a Repartitioned) or does it have to be >>>>> pre-created? The >>>>>>>>>>>>>>>> wording >>>>>>>>>>>>>>>>>>>>>> in >>>>>>>>>>>>>>>>>>>>>>> the KIP makes it seem like one version of the method >>>>> will >>>>>>>>>>>>>>>> auto-create >>>>>>>>>>>>>>>>>>>>>>> topics while the other will not. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>>>>>>>>>> Sophie >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> On Wed, Jul 17, 2019 at 10:15 AM Levani Kokhreidze < >>>>>>>>>>>>>>>>>>>>>> levani.co...@gmail.com <mailto:levani.co...@gmail.com> >>>>> <mailto:levani.co...@gmail.com <mailto:levani.co...@gmail.com> <mailto: >>> levani.co...@gmail.com <mailto:levani.co...@gmail.com>>>> >>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> One more bump about KIP-221 ( >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>> >>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint >>> < >>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint >>>> >>>>> < >>>>> >>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint >>> < >>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint >>>>> >>>>> < >>>>> >>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint >>> < >>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint >>>> >>>>> < >>>>> >>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint >>> < >>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint >>>> >>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> < >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>> >>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint >>> < >>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint >>>> >>>>> < >>>>> >>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint >>>>>> >>>>>>>>>>>>>>>>>>>>>>> ) >>>>>>>>>>>>>>>>>>>>>>>> so it doesn’t get lost in mailing list :) >>>>>>>>>>>>>>>>>>>>>>>> Would love to hear communities opinions/concerns >>> about >>>>> this KIP. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>>>>>>>>>>>> Levani >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> On Jul 12, 2019, at 5:27 PM, Levani Kokhreidze < >>>>>>>>>>>>>>>> levani.co...@gmail.com >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Kind reminder about this KIP: >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>> >>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint >>>>>>>>>>>>>>>>>>>>>>>> < >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>> >>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>>>>>>>>>>>>> Levani >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> On Jul 9, 2019, at 11:38 AM, Levani Kokhreidze < >>>>>>>>>>>>>>>>>>>>>> levani.co...@gmail.com >>>>>>>>>>>>>>>>>>>>>>>> <mailto:levani.co...@gmail.com>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> In order to move this KIP forward, I’ve updated >>>>> confluence >>>>>>>>>>>>>> page >>>>>>>>>>>>>>>> with >>>>>>>>>>>>>>>>>>>>>>>> the new proposal >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>> >>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint >>>>>>>>>>>>>>>>>>>>>>>> < >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>> >>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> I’ve also filled “Rejected Alternatives” section. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Looking forward to discuss this KIP :) >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> King regards, >>>>>>>>>>>>>>>>>>>>>>>>>> Levani >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> On Jul 3, 2019, at 1:08 PM, Levani Kokhreidze < >>>>>>>>>>>>>>>>>>>>>> levani.co...@gmail.com >>>>>>>>>>>>>>>>>>>>>>>> <mailto:levani.co...@gmail.com>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Hello Matthias, >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for the feedback and ideas. >>>>>>>>>>>>>>>>>>>>>>>>>>> I like the idea of introducing dedicated `Topic` >>>>> class for >>>>>>>>>>>>>>>> topic >>>>>>>>>>>>>>>>>>>>>>>> configuration for internal operators like >>> `groupedBy`. >>>>>>>>>>>>>>>>>>>>>>>>>>> Would be great to hear others opinion about this >>> as >>>>> well. >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Kind regards, >>>>>>>>>>>>>>>>>>>>>>>>>>> Levani >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> On Jul 3, 2019, at 7:00 AM, Matthias J. Sax < >>>>>>>>>>>>>>>> matth...@confluent.io >>>>>>>>>>>>>>>>>>>>>>>> <mailto:matth...@confluent.io>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Levani, >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for picking up this KIP! And thanks for >>>>> summarizing >>>>>>>>>>>>>>>>>>>>>> everything. >>>>>>>>>>>>>>>>>>>>>>>>>>>> Even if some points may have been discussed >>>>> already (can't >>>>>>>>>>>>>>>> really >>>>>>>>>>>>>>>>>>>>>>>>>>>> remember), it's helpful to get a good summary to >>>>> refresh the >>>>>>>>>>>>>>>>>>>>>>>> discussion. >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> I think your reasoning makes sense. With regard >>> to >>>>> the >>>>>>>>>>>>>>>> distinction >>>>>>>>>>>>>>>>>>>>>>>>>>>> between operators that manage topics and >>> operators >>>>> that use >>>>>>>>>>>>>>>>>>>>>>>> user-created >>>>>>>>>>>>>>>>>>>>>>>>>>>> topics: Following this argument, it might >>> indicate >>>>> that >>>>>>>>>>>>>>>> leaving >>>>>>>>>>>>>>>>>>>>>>>>>>>> `through()` as-is (as an operator that uses >>>>> use-defined >>>>>>>>>>>>>>>> topics) and >>>>>>>>>>>>>>>>>>>>>>>>>>>> introducing a new `repartition()` operator (an >>>>> operator that >>>>>>>>>>>>>>>> manages >>>>>>>>>>>>>>>>>>>>>>>>>>>> topics itself) might be good. Otherwise, there is >>>>> one >>>>>>>>>>>>>> operator >>>>>>>>>>>>>>>>>>>>>>>>>>>> `through()` that sometimes manages topics but >>>>> sometimes >>>>>>>>>>>>>> not; a >>>>>>>>>>>>>>>>>>>>>>>> different >>>>>>>>>>>>>>>>>>>>>>>>>>>> name, ie, new operator would make the distinction >>>>> clearer. >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> About adding `numOfPartitions` to `Grouped`. I am >>>>> wondering >>>>>>>>>>>>>>>> if the >>>>>>>>>>>>>>>>>>>>>>>> same >>>>>>>>>>>>>>>>>>>>>>>>>>>> argument as for `Produced` does apply and adding >>>>> it is >>>>>>>>>>>>>>>> semantically >>>>>>>>>>>>>>>>>>>>>>>>>>>> questionable? Might be good to get opinions of >>>>> others on >>>>>>>>>>>>>>>> this, too. >>>>>>>>>>>>>>>>>>>>>> I >>>>>>>>>>>>>>>>>>>>>>>> am >>>>>>>>>>>>>>>>>>>>>>>>>>>> not sure myself what solution I prefer atm. >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> So far, KS uses configuration objects that allow >>> to >>>>>>>>>>>>>> configure >>>>>>>>>>>>>>>> a >>>>>>>>>>>>>>>>>>>>>>>> certain >>>>>>>>>>>>>>>>>>>>>>>>>>>> "entity" like a consumer, producer, store. If we >>>>> assume that >>>>>>>>>>>>>>>> a topic >>>>>>>>>>>>>>>>>>>>>>>> is >>>>>>>>>>>>>>>>>>>>>>>>>>>> a similar entity, I am wonder if we should have a >>>>>>>>>>>>>>>>>>>>>>>>>>>> `Topic#withNumberOfPartitions()` class and method >>>>> instead of >>>>>>>>>>>>>>>> a plain >>>>>>>>>>>>>>>>>>>>>>>>>>>> integer? This would allow us to add other >>> configs, >>>>> like >>>>>>>>>>>>>>>> replication >>>>>>>>>>>>>>>>>>>>>>>>>>>> factor, retention-time etc, easily, without the >>>>> need to >>>>>>>>>>>>>>>> change the >>>>>>>>>>>>>>>>>>>>>>>> "main >>>>>>>>>>>>>>>>>>>>>>>>>>>> API". >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Just want to give some ideas. Not sure if I like >>>>> them >>>>>>>>>>>>>> myself. >>>>>>>>>>>>>>>> :) >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> -Matthias >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> On 7/1/19 1:04 AM, Levani Kokhreidze wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Actually, giving it more though - maybe >>> enhancing >>>>> Produced >>>>>>>>>>>>>>>> with num >>>>>>>>>>>>>>>>>>>>>>>> of partitions configuration is not the best approach. >>>>> Let me >>>>>>>>>>>>>>>> explain >>>>>>>>>>>>>>>>>>>>>> why: >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1) If we enhance Produced class with this >>>>> configuration, >>>>>>>>>>>>>>>> this will >>>>>>>>>>>>>>>>>>>>>>>> also affect KStream#to operation. Since KStream#to is >>>>> the final >>>>>>>>>>>>>>>> sink of >>>>>>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>>>>> topology, for me, it seems to be reasonable >>> assumption >>>>> that user >>>>>>>>>>>>>>>> needs >>>>>>>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>>>>>>> manually create sink topic in advance. And in that >>>>> case, having >>>>>>>>>>>>>>>> num of >>>>>>>>>>>>>>>>>>>>>>>> partitions configuration doesn’t make much sense. >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2) Looking at Produced class, based on API >>>>> contract, seems >>>>>>>>>>>>>>>> like >>>>>>>>>>>>>>>>>>>>>>>> Produced is designed to be something that is >>>>> explicitly for >>>>>>>>>>>>>>>> producer >>>>>>>>>>>>>>>>>>>>>> (key >>>>>>>>>>>>>>>>>>>>>>>> serializer, value serializer, partitioner those all >>>>> are producer >>>>>>>>>>>>>>>>>>>>>> specific >>>>>>>>>>>>>>>>>>>>>>>> configurations) and num of partitions is topic level >>>>>>>>>>>>>>>> configuration. And >>>>>>>>>>>>>>>>>>>>>> I >>>>>>>>>>>>>>>>>>>>>>>> don’t think mixing topic and producer level >>>>> configurations >>>>>>>>>>>>>>>> together in >>>>>>>>>>>>>>>>>>>>>> one >>>>>>>>>>>>>>>>>>>>>>>> class is the good approach. >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> 3) Looking at KStream interface, seems like >>>>> Produced >>>>>>>>>>>>>>>> parameter is >>>>>>>>>>>>>>>>>>>>>>>> for operations that work with non-internal (e.g >>> topics >>>>> created >>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>>>>>> managed >>>>>>>>>>>>>>>>>>>>>>>> internally by Kafka Streams) topics and I think we >>>>> should leave >>>>>>>>>>>>>>>> it as >>>>>>>>>>>>>>>>>>>>>> it is >>>>>>>>>>>>>>>>>>>>>>>> in that case. >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Taking all this things into account, I think we >>>>> should >>>>>>>>>>>>>>>> distinguish >>>>>>>>>>>>>>>>>>>>>>>> between DSL operations, where Kafka Streams should >>>>> create and >>>>>>>>>>>>>>>> manage >>>>>>>>>>>>>>>>>>>>>>>> internal topics (KStream#groupBy) vs topics that >>>>> should be >>>>>>>>>>>>>>>> created in >>>>>>>>>>>>>>>>>>>>>>>> advance (e.g KStream#to). >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> To sum it up, I think adding numPartitions >>>>> configuration in >>>>>>>>>>>>>>>>>>>>>> Produced >>>>>>>>>>>>>>>>>>>>>>>> will result in mixing topic and producer level >>>>> configuration in >>>>>>>>>>>>>>>> one >>>>>>>>>>>>>>>>>>>>>> class >>>>>>>>>>>>>>>>>>>>>>>> and it’s gonna break existing API semantics. >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Regarding making topic name optional in >>>>> KStream#through - I >>>>>>>>>>>>>>>> think >>>>>>>>>>>>>>>>>>>>>>>> underline idea is very useful and giving users >>>>> possibility to >>>>>>>>>>>>>>>> specify >>>>>>>>>>>>>>>>>>>>>> num >>>>>>>>>>>>>>>>>>>>>>>> of partitions there is even more useful :) >>> Considering >>>>> arguments >>>>>>>>>>>>>>>> against >>>>>>>>>>>>>>>>>>>>>>>> adding num of partitions in Produced class, I see two >>>>> options >>>>>>>>>>>>>>>> here: >>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1) Add following method overloads >>>>>>>>>>>>>>>>>>>>>>>>>>>>> * through() - topic will be auto-generated and >>>>> num of >>>>>>>>>>>>>>>> partitions >>>>>>>>>>>>>>>>>>>>>>>> will be taken from source topic >>>>>>>>>>>>>>>>>>>>>>>>>>>>> * through(final int numOfPartitions) - topic >>> will >>>>> be auto >>>>>>>>>>>>>>>>>>>>>>>> generated with specified num of partitions >>>>>>>>>>>>>>>>>>>>>>>>>>>>> * through(final int numOfPartitions, final >>>>> Produced<K, V> >>>>>>>>>>>>>>>>>>>>>>>> produced) - topic will be with generated with >>>>> specified num of >>>>>>>>>>>>>>>>>>>>>> partitions >>>>>>>>>>>>>>>>>>>>>>>> and configuration taken from produced parameter. >>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2) Leave KStream#through as it is and introduce >>>>> new method >>>>>>>>>>>>>> - >>>>>>>>>>>>>>>>>>>>>>>> KStream#repartition (I think Matthias suggested this >>>>> in one of >>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>>> threads) >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Considering all mentioned above I propose the >>>>> following >>>>>>>>>>>>>> plan: >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Option A: >>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1) Leave Produced as it is >>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2) Add num of partitions configuration to >>> Grouped >>>>> class (as >>>>>>>>>>>>>>>>>>>>>>>> mentioned in the KIP) >>>>>>>>>>>>>>>>>>>>>>>>>>>>> 3) Add following method overloads to >>>>> KStream#through >>>>>>>>>>>>>>>>>>>>>>>>>>>>> * through() - topic will be auto-generated and >>>>> num of >>>>>>>>>>>>>>>> partitions >>>>>>>>>>>>>>>>>>>>>>>> will be taken from source topic >>>>>>>>>>>>>>>>>>>>>>>>>>>>> * through(final int numOfPartitions) - topic >>> will >>>>> be auto >>>>>>>>>>>>>>>>>>>>>>>> generated with specified num of partitions >>>>>>>>>>>>>>>>>>>>>>>>>>>>> * through(final int numOfPartitions, final >>>>> Produced<K, V> >>>>>>>>>>>>>>>>>>>>>>>> produced) - topic will be with generated with >>>>> specified num of >>>>>>>>>>>>>>>>>>>>>> partitions >>>>>>>>>>>>>>>>>>>>>>>> and configuration taken from produced parameter. >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Option B: >>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1) Leave Produced as it is >>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2) Add num of partitions configuration to >>> Grouped >>>>> class (as >>>>>>>>>>>>>>>>>>>>>>>> mentioned in the KIP) >>>>>>>>>>>>>>>>>>>>>>>>>>>>> 3) Add new operator KStream#repartition for >>>>> creating and >>>>>>>>>>>>>>>> managing >>>>>>>>>>>>>>>>>>>>>>>> internal repartition topics >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> P.S. I’m sorry if all of this was already >>>>> discussed in the >>>>>>>>>>>>>>>> mailing >>>>>>>>>>>>>>>>>>>>>>>> list, but I kinda got with all the threads that were >>>>> about this >>>>>>>>>>>>>>>> KIP :( >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Kind regards, >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Levani >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Jul 1, 2019, at 9:56 AM, Levani Kokhreidze < >>>>>>>>>>>>>>>>>>>>>>>> levani.co...@gmail.com <mailto: >>> levani.co...@gmail.com>> >>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I would like to resurrect discussion around >>>>> KIP-221. Going >>>>>>>>>>>>>>>> through >>>>>>>>>>>>>>>>>>>>>>>> the discussion thread, there’s seems to agreement >>>>> around >>>>>>>>>>>>>>>> usefulness of >>>>>>>>>>>>>>>>>>>>>> this >>>>>>>>>>>>>>>>>>>>>>>> feature. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Regarding the implementation, as far as I >>>>> understood, the >>>>>>>>>>>>>>>> most >>>>>>>>>>>>>>>>>>>>>>>> optimal solution for me seems the following: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1) Add two method overloads to KStream#through >>>>> method >>>>>>>>>>>>>>>> (essentially >>>>>>>>>>>>>>>>>>>>>>>> making topic name optional) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2) Enhance Produced class with numOfPartitions >>>>>>>>>>>>>> configuration >>>>>>>>>>>>>>>>>>>>>> field. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Those two changes will allow DSL users to >>> control >>>>>>>>>>>>>>>> parallelism and >>>>>>>>>>>>>>>>>>>>>>>> trigger re-partition without doing stateful >>> operations. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I will update KIP with interface changes around >>>>>>>>>>>>>>>> KStream#through if >>>>>>>>>>>>>>>>>>>>>>>> this changes sound sensible. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Kind regards, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Levani >>> >>> > >
signature.asc
Description: OpenPGP digital signature