Hello all, Sorry for bringing this thread again, but I would like to get some attention on this PR: https://github.com/apache/kafka/pull/7170 <https://github.com/apache/kafka/pull/7170> It's been a while now and I would love to move on to other KIPs as well. Please let me know if you have any concerns.
Regards, Levani > On Jul 26, 2019, at 11:25 AM, Levani Kokhreidze <levani.co...@gmail.com> > wrote: > > Hi all, > > Here’s voting thread for this KIP: > https://www.mail-archive.com/dev@kafka.apache.org/msg99680.html > <https://www.mail-archive.com/dev@kafka.apache.org/msg99680.html> > > Regards, > Levani > >> On Jul 24, 2019, at 11:15 PM, Levani Kokhreidze <levani.co...@gmail.com >> <mailto:levani.co...@gmail.com>> wrote: >> >> Hi Matthias, >> >> Thanks for the suggestion. I Don’t have strong opinion on that one. >> Agree that avoiding unnecessary method overloads is a good idea. >> >> Updated KIP >> >> Regards, >> Levani >> >> >>> On Jul 24, 2019, at 8:50 PM, Matthias J. Sax <matth...@confluent.io >>> <mailto:matth...@confluent.io>> wrote: >>> >>> One question: >>> >>> Why do we add >>> >>>> Repartitioned#with(final String name, final int numberOfPartitions) >>> >>> It seems that `#with(String name)`, `#numberOfPartitions(int)` in >>> combination with `withName()` and `withNumberOfPartitions()` should be >>> sufficient. Users can chain the method calls. >>> >>> (I think it's valuable to keep the number of overload small if possible.) >>> >>> Otherwise LGTM. >>> >>> >>> -Matthias >>> >>> >>> On 7/23/19 2:18 PM, Levani Kokhreidze wrote: >>>> Hello, >>>> >>>> Thanks all for your feedback. >>>> I started voting procedure for this KIP. If there’re any other concerns >>>> about this KIP, please let me know. >>>> >>>> Regards, >>>> Levani >>>> >>>>> On Jul 20, 2019, at 8:39 PM, Levani Kokhreidze <levani.co...@gmail.com >>>>> <mailto:levani.co...@gmail.com>> wrote: >>>>> >>>>> Hi Matthias, >>>>> >>>>> Thanks for the suggestion, makes sense. >>>>> I’ve updated KIP >>>>> (https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint >>>>> >>>>> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint> >>>>> >>>>> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint >>>>> >>>>> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint>>). >>>>> >>>>> Regards, >>>>> Levani >>>>> >>>>> >>>>>> On Jul 20, 2019, at 3:53 AM, Matthias J. Sax <matth...@confluent.io >>>>>> <mailto:matth...@confluent.io> <mailto:matth...@confluent.io >>>>>> <mailto:matth...@confluent.io>>> wrote: >>>>>> >>>>>> Thanks for driving the KIP. >>>>>> >>>>>> I agree that users need to be able to specify a partitioning strategy. >>>>>> >>>>>> Sophie raises a fair point about topic configs and producer configs. My >>>>>> take is, that consider `Repartitioned` as an "extension" to `Produced`, >>>>>> that adds topic configuration, is a good way to think about it and helps >>>>>> to keep the API "clean". >>>>>> >>>>>> >>>>>> With regard to method names. I would prefer to avoid abbreviations. Can >>>>>> we rename: >>>>>> >>>>>> `withNumOfPartitions` -> `withNumberOfPartitions` >>>>>> >>>>>> Furthermore, it might be good to add some more `static` methods: >>>>>> >>>>>> - Repartitioned.with(Serde<K>, Serde<V>) >>>>>> - Repartitioned.withNumberOfPartitions(int) >>>>>> - Repartitioned.streamPartitioner(StreamPartitioner) >>>>>> >>>>>> >>>>>> -Matthias >>>>>> >>>>>> On 7/19/19 3:33 PM, Levani Kokhreidze wrote: >>>>>>> Totally agree. I think in KStream interface it makes sense to have some >>>>>>> duplicate configurations between operators in order to keep API simple >>>>>>> and usable. >>>>>>> Also, as more surface API has, harder it is to have proper backward >>>>>>> compatibility. >>>>>>> While initial idea of keeping topic level configs separate was >>>>>>> exciting, having Repartitioned class encapsulate some producer level >>>>>>> configs makes API more readable. >>>>>>> >>>>>>> Regards, >>>>>>> Levani >>>>>>> >>>>>>>> On Jul 20, 2019, at 1:15 AM, Sophie Blee-Goldman <sop...@confluent.io >>>>>>>> <mailto:sop...@confluent.io> <mailto:sop...@confluent.io >>>>>>>> <mailto:sop...@confluent.io>>> wrote: >>>>>>>> >>>>>>>> I think that is a good point about trying to keep producer level >>>>>>>> configurations and (repartition) topic level considerations separate. >>>>>>>> Number of partitions is definitely purely a topic level configuration. >>>>>>>> But >>>>>>>> on some level, serdes and partitioners are just as much a topic >>>>>>>> configuration as a producer one. You could have two producers >>>>>>>> configured >>>>>>>> with different serdes and/or partitioners, but if they are writing to >>>>>>>> the >>>>>>>> same topic the result would be very difficult to part. So in a sense, >>>>>>>> these >>>>>>>> are configurations of topics in Streams, not just producers. >>>>>>>> >>>>>>>> Another way to think of it: while the Streams API is not always true to >>>>>>>> this, ideally all the relevant configs for an operator are wrapped >>>>>>>> into a >>>>>>>> single object (in this case, Repartitioned). We could instead split >>>>>>>> out the >>>>>>>> fields in common with Produced into a separate parameter to keep topic >>>>>>>> and >>>>>>>> producer level configurations separate, but this increases the API >>>>>>>> surface >>>>>>>> area by a lot. It's much more straightforward to just say "this is >>>>>>>> everything that this particular operator needs" without worrying about >>>>>>>> what >>>>>>>> exactly you're specifying. >>>>>>>> >>>>>>>> I suppose you could alternatively make Produced a field of >>>>>>>> Repartitioned, >>>>>>>> but I don't think we do this kind of composition elsewhere in Streams >>>>>>>> at >>>>>>>> the moment >>>>>>>> >>>>>>>> On Fri, Jul 19, 2019 at 1:45 PM Levani Kokhreidze >>>>>>>> <levani.co...@gmail.com <mailto:levani.co...@gmail.com> >>>>>>>> <mailto:levani.co...@gmail.com <mailto:levani.co...@gmail.com>>> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi Bill, >>>>>>>>> >>>>>>>>> Thanks a lot for the feedback. >>>>>>>>> Yes, that makes sense. I’ve updated KIP with >>>>>>>>> `Repartitioned#partitioner` >>>>>>>>> configuration. >>>>>>>>> In the beginning, I wanted to introduce a class for topic level >>>>>>>>> configuration and keep topic level and producer level configurations >>>>>>>>> (such >>>>>>>>> as Produced) separately (see my second email in this thread). >>>>>>>>> But while looking at the semantics of KStream interface, I couldn’t >>>>>>>>> really >>>>>>>>> figure out good operation name for Topic level configuration class >>>>>>>>> and just >>>>>>>>> introducing `Topic` config class was kinda breaking the semantics. >>>>>>>>> So I think having Repartitioned class which encapsulates topic and >>>>>>>>> producer level configurations for internal topics is viable thing to >>>>>>>>> do. >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Levani >>>>>>>>> >>>>>>>>>> On Jul 19, 2019, at 7:47 PM, Bill Bejeck <bbej...@gmail.com >>>>>>>>>> <mailto:bbej...@gmail.com> <mailto:bbej...@gmail.com >>>>>>>>>> <mailto:bbej...@gmail.com>>> wrote: >>>>>>>>>> >>>>>>>>>> Hi Lavani, >>>>>>>>>> >>>>>>>>>> Thanks for resurrecting this KIP. >>>>>>>>>> >>>>>>>>>> I'm also a +1 for adding a partition option. In addition to the >>>>>>>>>> reason >>>>>>>>>> provided by John, my reasoning is: >>>>>>>>>> >>>>>>>>>> 1. Users may want to use something other than hash-based partitioning >>>>>>>>>> 2. Users may wish to partition on something different than the key >>>>>>>>>> without having to change the key. For example: >>>>>>>>>> 1. A combination of fields in the value in conjunction with the key >>>>>>>>>> 2. Something other than the key >>>>>>>>>> 3. We allow users to specify a partitioner on Produced hence in >>>>>>>>>> KStream.to and KStream.through, so it makes sense for API >>>>>>>>>> consistency. >>>>>>>>>> >>>>>>>>>> Just my 2 cents. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Bill >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Fri, Jul 19, 2019 at 5:46 AM Levani Kokhreidze < >>>>>>>>> levani.co...@gmail.com <mailto:levani.co...@gmail.com> >>>>>>>>> <mailto:levani.co...@gmail.com <mailto:levani.co...@gmail.com>>> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Hi John, >>>>>>>>>>> >>>>>>>>>>> In my mind it makes sense. >>>>>>>>>>> If we add partitioner configuration to Repartitioned class, with the >>>>>>>>>>> combination of specifying number of partitions for internal topics, >>>>>>>>>>> user >>>>>>>>>>> will have opportunity to ensure co-partitioning before join >>>>>>>>>>> operation. >>>>>>>>>>> I think this can be quite powerful feature. >>>>>>>>>>> Wondering what others think about this? >>>>>>>>>>> >>>>>>>>>>> Regards, >>>>>>>>>>> Levani >>>>>>>>>>> >>>>>>>>>>>> On Jul 18, 2019, at 1:20 AM, John Roesler <j...@confluent.io >>>>>>>>>>>> <mailto:j...@confluent.io> <mailto:j...@confluent.io >>>>>>>>>>>> <mailto:j...@confluent.io>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> Yes, I believe that's what I had in mind. Again, not totally sure >>>>>>>>>>>> it >>>>>>>>>>>> makes sense, but I believe something similar is the rationale for >>>>>>>>>>>> having the partitioner option in Produced. >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> -John >>>>>>>>>>>> >>>>>>>>>>>> On Wed, Jul 17, 2019 at 3:20 PM Levani Kokhreidze >>>>>>>>>>>> <levani.co...@gmail.com <mailto:levani.co...@gmail.com> >>>>>>>>>>>> <mailto:levani.co...@gmail.com <mailto:levani.co...@gmail.com>>> >>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Hey John, >>>>>>>>>>>>> >>>>>>>>>>>>> Oh that’s interesting use-case. >>>>>>>>>>>>> Do I understand this correctly, in your example I would first >>>>>>>>>>>>> issue >>>>>>>>>>> repartition(Repartitioned) with proper partitioner that essentially >>>>>>>>> would >>>>>>>>>>> be the same as the topic I want to join with and then do the >>>>>>>>> KStream#join >>>>>>>>>>> with DSL? >>>>>>>>>>>>> >>>>>>>>>>>>> Regards, >>>>>>>>>>>>> Levani >>>>>>>>>>>>> >>>>>>>>>>>>>> On Jul 17, 2019, at 11:11 PM, John Roesler <j...@confluent.io >>>>>>>>>>>>>> <mailto:j...@confluent.io> <mailto:j...@confluent.io >>>>>>>>>>>>>> <mailto:j...@confluent.io>>> >>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> Hey, all, just to chime in, >>>>>>>>>>>>>> >>>>>>>>>>>>>> I think it might be useful to have an option to specify the >>>>>>>>>>>>>> partitioner. The case I have in mind is that some data may get >>>>>>>>>>>>>> repartitioned and then joined with an input topic. If the >>>>>>>>>>>>>> right-side >>>>>>>>>>>>>> input topic uses a custom partitioning strategy, then the >>>>>>>>>>>>>> repartitioned stream also needs to be partitioned with the same >>>>>>>>>>>>>> strategy. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Does that make sense, or did I maybe miss something important? >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>> -John >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Wed, Jul 17, 2019 at 2:48 PM Levani Kokhreidze >>>>>>>>>>>>>> <levani.co...@gmail.com <mailto:levani.co...@gmail.com> >>>>>>>>>>>>>> <mailto:levani.co...@gmail.com <mailto:levani.co...@gmail.com>>> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Yes, I was thinking about it as well. To be honest I’m not sure >>>>>>>>> about >>>>>>>>>>> it yet. >>>>>>>>>>>>>>> As Kafka Streams DSL user, I don’t really think I would need >>>>>>>>>>>>>>> control >>>>>>>>>>> over partitioner for internal topics. >>>>>>>>>>>>>>> As a user, I would assume that Kafka Streams knows best how to >>>>>>>>>>> partition data for internal topics. >>>>>>>>>>>>>>> In this KIP I wrote that Produced should be used only for topics >>>>>>>>> that >>>>>>>>>>> are created by user In advance. >>>>>>>>>>>>>>> In those cases maybe it make sense to have possibility to >>>>>>>>>>>>>>> specify >>>>>>>>> the >>>>>>>>>>> partitioner. >>>>>>>>>>>>>>> I don’t have clear answer on that yet, but I guess specifying >>>>>>>>>>>>>>> the >>>>>>>>>>> partitioner can be added as well if there’s agreement on this. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>>> Levani >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Jul 17, 2019, at 10:42 PM, Sophie Blee-Goldman < >>>>>>>>>>> sop...@confluent.io <mailto:sop...@confluent.io> >>>>>>>>>>> <mailto:sop...@confluent.io <mailto:sop...@confluent.io>>> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks for clearing that up. I agree that Repartitioned would >>>>>>>>>>>>>>>> be a >>>>>>>>>>> useful >>>>>>>>>>>>>>>> addition. I'm wondering if it might also need to have >>>>>>>>>>>>>>>> a withStreamPartitioner method/field, similar to Produced? I'm >>>>>>>>>>>>>>>> not >>>>>>>>>>> sure how >>>>>>>>>>>>>>>> widely this feature is really used, but seems it should be >>>>>>>>> available >>>>>>>>>>> for >>>>>>>>>>>>>>>> repartition topics. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Wed, Jul 17, 2019 at 11:26 AM Levani Kokhreidze < >>>>>>>>>>> levani.co...@gmail.com <mailto:levani.co...@gmail.com> >>>>>>>>>>> <mailto:levani.co...@gmail.com <mailto:levani.co...@gmail.com>>> >>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hey Sophie, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> In both cases KStream#repartition and >>>>>>>>>>> KStream#repartition(Repartitioned) >>>>>>>>>>>>>>>>> topic will be created and managed by Kafka Streams. >>>>>>>>>>>>>>>>> Idea of Repartitioned is to give user more control over the >>>>>>>>>>>>>>>>> topic >>>>>>>>>>> such as >>>>>>>>>>>>>>>>> num of partitions. >>>>>>>>>>>>>>>>> I feel like Repartitioned parameter is something that is >>>>>>>>>>>>>>>>> missing >>>>>>>>> in >>>>>>>>>>>>>>>>> current DSL design. >>>>>>>>>>>>>>>>> Essentially giving user control over parallelism by >>>>>>>>>>>>>>>>> configuring >>>>>>>>> num >>>>>>>>>>> of >>>>>>>>>>>>>>>>> partitions for internal topics. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hope this answers your question. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>>>>> Levani >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Jul 17, 2019, at 9:02 PM, Sophie Blee-Goldman < >>>>>>>>>>> sop...@confluent.io <mailto:sop...@confluent.io> >>>>>>>>>>> <mailto:sop...@confluent.io <mailto:sop...@confluent.io>>> >>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Hey Levani, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Thanks for the KIP! Can you clarify one thing for me -- for >>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>> KStream#repartition signature taking a Repartitioned, will >>>>>>>>>>>>>>>>>> the >>>>>>>>>>> topic be >>>>>>>>>>>>>>>>>> auto-created by Streams (which seems to be the case for the >>>>>>>>>>> signature >>>>>>>>>>>>>>>>>> without a Repartitioned) or does it have to be pre-created? >>>>>>>>>>>>>>>>>> The >>>>>>>>>>> wording >>>>>>>>>>>>>>>>> in >>>>>>>>>>>>>>>>>> the KIP makes it seem like one version of the method will >>>>>>>>>>> auto-create >>>>>>>>>>>>>>>>>> topics while the other will not. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>>>>> Sophie >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Wed, Jul 17, 2019 at 10:15 AM Levani Kokhreidze < >>>>>>>>>>>>>>>>> levani.co...@gmail.com <mailto:levani.co...@gmail.com> >>>>>>>>>>>>>>>>> <mailto:levani.co...@gmail.com >>>>>>>>>>>>>>>>> <mailto:levani.co...@gmail.com>>> >>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> One more bump about KIP-221 ( >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint >>>>>>>>> >>>>>>>>> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint> >>>>>>>>> >>>>>>>>> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint >>>>>>>>> >>>>>>>>> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint>> >>>>>>>>>>>>>>>>>>> < >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint >>>>>>>>> >>>>>>>>> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint> >>>>>>>>>>>>>>>>>> ) >>>>>>>>>>>>>>>>>>> so it doesn’t get lost in mailing list :) >>>>>>>>>>>>>>>>>>> Would love to hear communities opinions/concerns about this >>>>>>>>>>>>>>>>>>> KIP. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>>>>>>> Levani >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On Jul 12, 2019, at 5:27 PM, Levani Kokhreidze < >>>>>>>>>>> levani.co...@gmail.com >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Kind reminder about this KIP: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint >>>>>>>>>>>>>>>>>>> < >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>>>>>>>> Levani >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On Jul 9, 2019, at 11:38 AM, Levani Kokhreidze < >>>>>>>>>>>>>>>>> levani.co...@gmail.com >>>>>>>>>>>>>>>>>>> <mailto:levani.co...@gmail.com>> wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> In order to move this KIP forward, I’ve updated confluence >>>>>>>>> page >>>>>>>>>>> with >>>>>>>>>>>>>>>>>>> the new proposal >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint >>>>>>>>>>>>>>>>>>> < >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> I’ve also filled “Rejected Alternatives” section. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Looking forward to discuss this KIP :) >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> King regards, >>>>>>>>>>>>>>>>>>>>> Levani >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> On Jul 3, 2019, at 1:08 PM, Levani Kokhreidze < >>>>>>>>>>>>>>>>> levani.co...@gmail.com >>>>>>>>>>>>>>>>>>> <mailto:levani.co...@gmail.com>> wrote: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Hello Matthias, >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Thanks for the feedback and ideas. >>>>>>>>>>>>>>>>>>>>>> I like the idea of introducing dedicated `Topic` class >>>>>>>>>>>>>>>>>>>>>> for >>>>>>>>>>> topic >>>>>>>>>>>>>>>>>>> configuration for internal operators like `groupedBy`. >>>>>>>>>>>>>>>>>>>>>> Would be great to hear others opinion about this as well. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Kind regards, >>>>>>>>>>>>>>>>>>>>>> Levani >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> On Jul 3, 2019, at 7:00 AM, Matthias J. Sax < >>>>>>>>>>> matth...@confluent.io >>>>>>>>>>>>>>>>>>> <mailto:matth...@confluent.io>> wrote: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Levani, >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Thanks for picking up this KIP! And thanks for >>>>>>>>>>>>>>>>>>>>>>> summarizing >>>>>>>>>>>>>>>>> everything. >>>>>>>>>>>>>>>>>>>>>>> Even if some points may have been discussed already >>>>>>>>>>>>>>>>>>>>>>> (can't >>>>>>>>>>> really >>>>>>>>>>>>>>>>>>>>>>> remember), it's helpful to get a good summary to >>>>>>>>>>>>>>>>>>>>>>> refresh the >>>>>>>>>>>>>>>>>>> discussion. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> I think your reasoning makes sense. With regard to the >>>>>>>>>>> distinction >>>>>>>>>>>>>>>>>>>>>>> between operators that manage topics and operators that >>>>>>>>>>>>>>>>>>>>>>> use >>>>>>>>>>>>>>>>>>> user-created >>>>>>>>>>>>>>>>>>>>>>> topics: Following this argument, it might indicate that >>>>>>>>>>> leaving >>>>>>>>>>>>>>>>>>>>>>> `through()` as-is (as an operator that uses use-defined >>>>>>>>>>> topics) and >>>>>>>>>>>>>>>>>>>>>>> introducing a new `repartition()` operator (an operator >>>>>>>>>>>>>>>>>>>>>>> that >>>>>>>>>>> manages >>>>>>>>>>>>>>>>>>>>>>> topics itself) might be good. Otherwise, there is one >>>>>>>>> operator >>>>>>>>>>>>>>>>>>>>>>> `through()` that sometimes manages topics but sometimes >>>>>>>>> not; a >>>>>>>>>>>>>>>>>>> different >>>>>>>>>>>>>>>>>>>>>>> name, ie, new operator would make the distinction >>>>>>>>>>>>>>>>>>>>>>> clearer. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> About adding `numOfPartitions` to `Grouped`. I am >>>>>>>>>>>>>>>>>>>>>>> wondering >>>>>>>>>>> if the >>>>>>>>>>>>>>>>>>> same >>>>>>>>>>>>>>>>>>>>>>> argument as for `Produced` does apply and adding it is >>>>>>>>>>> semantically >>>>>>>>>>>>>>>>>>>>>>> questionable? Might be good to get opinions of others on >>>>>>>>>>> this, too. >>>>>>>>>>>>>>>>> I >>>>>>>>>>>>>>>>>>> am >>>>>>>>>>>>>>>>>>>>>>> not sure myself what solution I prefer atm. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> So far, KS uses configuration objects that allow to >>>>>>>>> configure >>>>>>>>>>> a >>>>>>>>>>>>>>>>>>> certain >>>>>>>>>>>>>>>>>>>>>>> "entity" like a consumer, producer, store. If we assume >>>>>>>>>>>>>>>>>>>>>>> that >>>>>>>>>>> a topic >>>>>>>>>>>>>>>>>>> is >>>>>>>>>>>>>>>>>>>>>>> a similar entity, I am wonder if we should have a >>>>>>>>>>>>>>>>>>>>>>> `Topic#withNumberOfPartitions()` class and method >>>>>>>>>>>>>>>>>>>>>>> instead of >>>>>>>>>>> a plain >>>>>>>>>>>>>>>>>>>>>>> integer? This would allow us to add other configs, like >>>>>>>>>>> replication >>>>>>>>>>>>>>>>>>>>>>> factor, retention-time etc, easily, without the need to >>>>>>>>>>> change the >>>>>>>>>>>>>>>>>>> "main >>>>>>>>>>>>>>>>>>>>>>> API". >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Just want to give some ideas. Not sure if I like them >>>>>>>>> myself. >>>>>>>>>>> :) >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> -Matthias >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> On 7/1/19 1:04 AM, Levani Kokhreidze wrote: >>>>>>>>>>>>>>>>>>>>>>>> Actually, giving it more though - maybe enhancing >>>>>>>>>>>>>>>>>>>>>>>> Produced >>>>>>>>>>> with num >>>>>>>>>>>>>>>>>>> of partitions configuration is not the best approach. Let me >>>>>>>>>>> explain >>>>>>>>>>>>>>>>> why: >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> 1) If we enhance Produced class with this >>>>>>>>>>>>>>>>>>>>>>>> configuration, >>>>>>>>>>> this will >>>>>>>>>>>>>>>>>>> also affect KStream#to operation. Since KStream#to is the >>>>>>>>>>>>>>>>>>> final >>>>>>>>>>> sink of >>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>> topology, for me, it seems to be reasonable assumption that >>>>>>>>>>>>>>>>>>> user >>>>>>>>>>> needs >>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>> manually create sink topic in advance. And in that case, >>>>>>>>>>>>>>>>>>> having >>>>>>>>>>> num of >>>>>>>>>>>>>>>>>>> partitions configuration doesn’t make much sense. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> 2) Looking at Produced class, based on API contract, >>>>>>>>>>>>>>>>>>>>>>>> seems >>>>>>>>>>> like >>>>>>>>>>>>>>>>>>> Produced is designed to be something that is explicitly for >>>>>>>>>>> producer >>>>>>>>>>>>>>>>> (key >>>>>>>>>>>>>>>>>>> serializer, value serializer, partitioner those all are >>>>>>>>>>>>>>>>>>> producer >>>>>>>>>>>>>>>>> specific >>>>>>>>>>>>>>>>>>> configurations) and num of partitions is topic level >>>>>>>>>>> configuration. And >>>>>>>>>>>>>>>>> I >>>>>>>>>>>>>>>>>>> don’t think mixing topic and producer level configurations >>>>>>>>>>> together in >>>>>>>>>>>>>>>>> one >>>>>>>>>>>>>>>>>>> class is the good approach. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> 3) Looking at KStream interface, seems like Produced >>>>>>>>>>> parameter is >>>>>>>>>>>>>>>>>>> for operations that work with non-internal (e.g topics >>>>>>>>>>>>>>>>>>> created >>>>>>>>> and >>>>>>>>>>>>>>>>> managed >>>>>>>>>>>>>>>>>>> internally by Kafka Streams) topics and I think we should >>>>>>>>>>>>>>>>>>> leave >>>>>>>>>>> it as >>>>>>>>>>>>>>>>> it is >>>>>>>>>>>>>>>>>>> in that case. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Taking all this things into account, I think we should >>>>>>>>>>> distinguish >>>>>>>>>>>>>>>>>>> between DSL operations, where Kafka Streams should create >>>>>>>>>>>>>>>>>>> and >>>>>>>>>>> manage >>>>>>>>>>>>>>>>>>> internal topics (KStream#groupBy) vs topics that should be >>>>>>>>>>> created in >>>>>>>>>>>>>>>>>>> advance (e.g KStream#to). >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> To sum it up, I think adding numPartitions >>>>>>>>>>>>>>>>>>>>>>>> configuration in >>>>>>>>>>>>>>>>> Produced >>>>>>>>>>>>>>>>>>> will result in mixing topic and producer level >>>>>>>>>>>>>>>>>>> configuration in >>>>>>>>>>> one >>>>>>>>>>>>>>>>> class >>>>>>>>>>>>>>>>>>> and it’s gonna break existing API semantics. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Regarding making topic name optional in >>>>>>>>>>>>>>>>>>>>>>>> KStream#through - I >>>>>>>>>>> think >>>>>>>>>>>>>>>>>>> underline idea is very useful and giving users possibility >>>>>>>>>>>>>>>>>>> to >>>>>>>>>>> specify >>>>>>>>>>>>>>>>> num >>>>>>>>>>>>>>>>>>> of partitions there is even more useful :) Considering >>>>>>>>>>>>>>>>>>> arguments >>>>>>>>>>> against >>>>>>>>>>>>>>>>>>> adding num of partitions in Produced class, I see two >>>>>>>>>>>>>>>>>>> options >>>>>>>>>>> here: >>>>>>>>>>>>>>>>>>>>>>>> 1) Add following method overloads >>>>>>>>>>>>>>>>>>>>>>>> * through() - topic will be auto-generated and num of >>>>>>>>>>> partitions >>>>>>>>>>>>>>>>>>> will be taken from source topic >>>>>>>>>>>>>>>>>>>>>>>> * through(final int numOfPartitions) - topic will be >>>>>>>>>>>>>>>>>>>>>>>> auto >>>>>>>>>>>>>>>>>>> generated with specified num of partitions >>>>>>>>>>>>>>>>>>>>>>>> * through(final int numOfPartitions, final Produced<K, >>>>>>>>>>>>>>>>>>>>>>>> V> >>>>>>>>>>>>>>>>>>> produced) - topic will be with generated with specified num >>>>>>>>>>>>>>>>>>> of >>>>>>>>>>>>>>>>> partitions >>>>>>>>>>>>>>>>>>> and configuration taken from produced parameter. >>>>>>>>>>>>>>>>>>>>>>>> 2) Leave KStream#through as it is and introduce new >>>>>>>>>>>>>>>>>>>>>>>> method >>>>>>>>> - >>>>>>>>>>>>>>>>>>> KStream#repartition (I think Matthias suggested this in one >>>>>>>>>>>>>>>>>>> of >>>>>>>>> the >>>>>>>>>>>>>>>>> threads) >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Considering all mentioned above I propose the following >>>>>>>>> plan: >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Option A: >>>>>>>>>>>>>>>>>>>>>>>> 1) Leave Produced as it is >>>>>>>>>>>>>>>>>>>>>>>> 2) Add num of partitions configuration to Grouped >>>>>>>>>>>>>>>>>>>>>>>> class (as >>>>>>>>>>>>>>>>>>> mentioned in the KIP) >>>>>>>>>>>>>>>>>>>>>>>> 3) Add following method overloads to KStream#through >>>>>>>>>>>>>>>>>>>>>>>> * through() - topic will be auto-generated and num of >>>>>>>>>>> partitions >>>>>>>>>>>>>>>>>>> will be taken from source topic >>>>>>>>>>>>>>>>>>>>>>>> * through(final int numOfPartitions) - topic will be >>>>>>>>>>>>>>>>>>>>>>>> auto >>>>>>>>>>>>>>>>>>> generated with specified num of partitions >>>>>>>>>>>>>>>>>>>>>>>> * through(final int numOfPartitions, final Produced<K, >>>>>>>>>>>>>>>>>>>>>>>> V> >>>>>>>>>>>>>>>>>>> produced) - topic will be with generated with specified num >>>>>>>>>>>>>>>>>>> of >>>>>>>>>>>>>>>>> partitions >>>>>>>>>>>>>>>>>>> and configuration taken from produced parameter. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Option B: >>>>>>>>>>>>>>>>>>>>>>>> 1) Leave Produced as it is >>>>>>>>>>>>>>>>>>>>>>>> 2) Add num of partitions configuration to Grouped >>>>>>>>>>>>>>>>>>>>>>>> class (as >>>>>>>>>>>>>>>>>>> mentioned in the KIP) >>>>>>>>>>>>>>>>>>>>>>>> 3) Add new operator KStream#repartition for creating >>>>>>>>>>>>>>>>>>>>>>>> and >>>>>>>>>>> managing >>>>>>>>>>>>>>>>>>> internal repartition topics >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> P.S. I’m sorry if all of this was already discussed in >>>>>>>>>>>>>>>>>>>>>>>> the >>>>>>>>>>> mailing >>>>>>>>>>>>>>>>>>> list, but I kinda got with all the threads that were about >>>>>>>>>>>>>>>>>>> this >>>>>>>>>>> KIP :( >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Kind regards, >>>>>>>>>>>>>>>>>>>>>>>> Levani >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> On Jul 1, 2019, at 9:56 AM, Levani Kokhreidze < >>>>>>>>>>>>>>>>>>> levani.co...@gmail.com <mailto:levani.co...@gmail.com>> >>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> I would like to resurrect discussion around KIP-221. >>>>>>>>>>>>>>>>>>>>>>>>> Going >>>>>>>>>>> through >>>>>>>>>>>>>>>>>>> the discussion thread, there’s seems to agreement around >>>>>>>>>>> usefulness of >>>>>>>>>>>>>>>>> this >>>>>>>>>>>>>>>>>>> feature. >>>>>>>>>>>>>>>>>>>>>>>>> Regarding the implementation, as far as I understood, >>>>>>>>>>>>>>>>>>>>>>>>> the >>>>>>>>>>> most >>>>>>>>>>>>>>>>>>> optimal solution for me seems the following: >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> 1) Add two method overloads to KStream#through method >>>>>>>>>>> (essentially >>>>>>>>>>>>>>>>>>> making topic name optional) >>>>>>>>>>>>>>>>>>>>>>>>> 2) Enhance Produced class with numOfPartitions >>>>>>>>> configuration >>>>>>>>>>>>>>>>> field. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Those two changes will allow DSL users to control >>>>>>>>>>> parallelism and >>>>>>>>>>>>>>>>>>> trigger re-partition without doing stateful operations. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> I will update KIP with interface changes around >>>>>>>>>>> KStream#through if >>>>>>>>>>>>>>>>>>> this changes sound sensible. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Kind regards, >>>>>>>>>>>>>>>>>>>>>>>>> Levani >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>>> >>> >> >