Re: [DISCUSS] KIP-221: Repartition Topic Hints in Streams

Levani Kokhreidze Sat, 20 Jul 2019 10:39:47 -0700

Hi Matthias,

Thanks for the suggestion, makes sense.
I’ve updated KIP 
(https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
 
<https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint>).


Regards,
Levani


> On Jul 20, 2019, at 3:53 AM, Matthias J. Sax <[email protected]> wrote:
> 
> Thanks for driving the KIP.
> 
> I agree that users need to be able to specify a partitioning strategy.
> 
> Sophie raises a fair point about topic configs and producer configs. My
> take is, that consider `Repartitioned` as an "extension" to `Produced`,
> that adds topic configuration, is a good way to think about it and helps
> to keep the API "clean".
> 
> 
> With regard to method names. I would prefer to avoid abbreviations. Can
> we rename:
> 
> `withNumOfPartitions` -> `withNumberOfPartitions`
> 
> Furthermore, it might be good to add some more `static` methods:
> 
> - Repartitioned.with(Serde<K>, Serde<V>)
> - Repartitioned.withNumberOfPartitions(int)
> - Repartitioned.streamPartitioner(StreamPartitioner)
> 
> 
> -Matthias
> 
> On 7/19/19 3:33 PM, Levani Kokhreidze wrote:
>> Totally agree. I think in KStream interface it makes sense to have some 
>> duplicate configurations between operators in order to keep API simple and 
>> usable.
>> Also, as more surface API has, harder it is to have proper backward 
>> compatibility.
>> While initial idea of keeping topic level configs separate was exciting, 
>> having Repartitioned class encapsulate some producer level configs makes API 
>> more readable.
>> 
>> Regards,
>> Levani
>> 
>>> On Jul 20, 2019, at 1:15 AM, Sophie Blee-Goldman <[email protected]> 
>>> wrote:
>>> 
>>> I think that is a good point about trying to keep producer level
>>> configurations and (repartition) topic level considerations separate.
>>> Number of partitions is definitely purely a topic level configuration. But
>>> on some level, serdes and partitioners are just as much a topic
>>> configuration as a producer one. You could have two producers configured
>>> with different serdes and/or partitioners, but if they are writing to the
>>> same topic the result would be very difficult to part. So in a sense, these
>>> are configurations of topics in Streams, not just producers.
>>> 
>>> Another way to think of it: while the Streams API is not always true to
>>> this, ideally all the relevant configs for an operator are wrapped into a
>>> single object (in this case, Repartitioned). We could instead split out the
>>> fields in common with Produced into a separate parameter to keep topic and
>>> producer level configurations separate, but this increases the API surface
>>> area by a lot. It's much more straightforward to just say "this is
>>> everything that this particular operator needs" without worrying about what
>>> exactly you're specifying.
>>> 
>>> I suppose you could alternatively make Produced a field of Repartitioned,
>>> but I don't think we do this kind of composition elsewhere in Streams at
>>> the moment
>>> 
>>> On Fri, Jul 19, 2019 at 1:45 PM Levani Kokhreidze <[email protected]>
>>> wrote:
>>> 
>>>> Hi Bill,
>>>> 
>>>> Thanks a lot for the feedback.
>>>> Yes, that makes sense. I’ve updated KIP with `Repartitioned#partitioner`
>>>> configuration.
>>>> In the beginning, I wanted to introduce a class for topic level
>>>> configuration and keep topic level and producer level configurations (such
>>>> as Produced) separately (see my second email in this thread).
>>>> But while looking at the semantics of KStream interface, I couldn’t really
>>>> figure out good operation name for Topic level configuration class and just
>>>> introducing `Topic` config class was kinda breaking the semantics.
>>>> So I think having Repartitioned class which encapsulates topic and
>>>> producer level configurations for internal topics is viable thing to do.
>>>> 
>>>> Regards,
>>>> Levani
>>>> 
>>>>> On Jul 19, 2019, at 7:47 PM, Bill Bejeck <[email protected]> wrote:
>>>>> 
>>>>> Hi Lavani,
>>>>> 
>>>>> Thanks for resurrecting this KIP.
>>>>> 
>>>>> I'm also a +1 for adding a partition option.  In addition to the reason
>>>>> provided by John, my reasoning is:
>>>>> 
>>>>> 1. Users may want to use something other than hash-based partitioning
>>>>> 2. Users may wish to partition on something different than the key
>>>>> without having to change the key.  For example:
>>>>>    1. A combination of fields in the value in conjunction with the key
>>>>>    2. Something other than the key
>>>>> 3. We allow users to specify a partitioner on Produced hence in
>>>>> KStream.to and KStream.through, so it makes sense for API consistency.
>>>>> 
>>>>> Just my  2 cents.
>>>>> 
>>>>> Thanks,
>>>>> Bill
>>>>> 
>>>>> 
>>>>> 
>>>>> On Fri, Jul 19, 2019 at 5:46 AM Levani Kokhreidze <
>>>> [email protected]>
>>>>> wrote:
>>>>> 
>>>>>> Hi John,
>>>>>> 
>>>>>> In my mind it makes sense.
>>>>>> If we add partitioner configuration to Repartitioned class, with the
>>>>>> combination of specifying number of partitions for internal topics, user
>>>>>> will have opportunity to ensure co-partitioning before join operation.
>>>>>> I think this can be quite powerful feature.
>>>>>> Wondering what others think about this?
>>>>>> 
>>>>>> Regards,
>>>>>> Levani
>>>>>> 
>>>>>>> On Jul 18, 2019, at 1:20 AM, John Roesler <[email protected]> wrote:
>>>>>>> 
>>>>>>> Yes, I believe that's what I had in mind. Again, not totally sure it
>>>>>>> makes sense, but I believe something similar is the rationale for
>>>>>>> having the partitioner option in Produced.
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> -John
>>>>>>> 
>>>>>>> On Wed, Jul 17, 2019 at 3:20 PM Levani Kokhreidze
>>>>>>> <[email protected]> wrote:
>>>>>>>> 
>>>>>>>> Hey John,
>>>>>>>> 
>>>>>>>> Oh that’s interesting use-case.
>>>>>>>> Do I understand this correctly, in your example I would first issue
>>>>>> repartition(Repartitioned) with proper partitioner that essentially
>>>> would
>>>>>> be the same as the topic I want to join with and then do the
>>>> KStream#join
>>>>>> with DSL?
>>>>>>>> 
>>>>>>>> Regards,
>>>>>>>> Levani
>>>>>>>> 
>>>>>>>>> On Jul 17, 2019, at 11:11 PM, John Roesler <[email protected]>
>>>> wrote:
>>>>>>>>> 
>>>>>>>>> Hey, all, just to chime in,
>>>>>>>>> 
>>>>>>>>> I think it might be useful to have an option to specify the
>>>>>>>>> partitioner. The case I have in mind is that some data may get
>>>>>>>>> repartitioned and then joined with an input topic. If the right-side
>>>>>>>>> input topic uses a custom partitioning strategy, then the
>>>>>>>>> repartitioned stream also needs to be partitioned with the same
>>>>>>>>> strategy.
>>>>>>>>> 
>>>>>>>>> Does that make sense, or did I maybe miss something important?
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> -John
>>>>>>>>> 
>>>>>>>>> On Wed, Jul 17, 2019 at 2:48 PM Levani Kokhreidze
>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>> 
>>>>>>>>>> Yes, I was thinking about it as well. To be honest I’m not sure
>>>> about
>>>>>> it yet.
>>>>>>>>>> As Kafka Streams DSL user, I don’t really think I would need control
>>>>>> over partitioner for internal topics.
>>>>>>>>>> As a user, I would assume that Kafka Streams knows best how to
>>>>>> partition data for internal topics.
>>>>>>>>>> In this KIP I wrote that Produced should be used only for topics
>>>> that
>>>>>> are created by user In advance.
>>>>>>>>>> In those cases maybe it make sense to have possibility to specify
>>>> the
>>>>>> partitioner.
>>>>>>>>>> I don’t have clear answer on that yet, but I guess specifying the
>>>>>> partitioner can be added as well if there’s agreement on this.
>>>>>>>>>> 
>>>>>>>>>> Regards,
>>>>>>>>>> Levani
>>>>>>>>>> 
>>>>>>>>>>> On Jul 17, 2019, at 10:42 PM, Sophie Blee-Goldman <
>>>>>> [email protected]> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> Thanks for clearing that up. I agree that Repartitioned would be a
>>>>>> useful
>>>>>>>>>>> addition. I'm wondering if it might also need to have
>>>>>>>>>>> a withStreamPartitioner method/field, similar to Produced? I'm not
>>>>>> sure how
>>>>>>>>>>> widely this feature is really used, but seems it should be
>>>> available
>>>>>> for
>>>>>>>>>>> repartition topics.
>>>>>>>>>>> 
>>>>>>>>>>> On Wed, Jul 17, 2019 at 11:26 AM Levani Kokhreidze <
>>>>>> [email protected]>
>>>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> Hey Sophie,
>>>>>>>>>>>> 
>>>>>>>>>>>> In both cases KStream#repartition and
>>>>>> KStream#repartition(Repartitioned)
>>>>>>>>>>>> topic will be created and managed by Kafka Streams.
>>>>>>>>>>>> Idea of Repartitioned is to give user more control over the topic
>>>>>> such as
>>>>>>>>>>>> num of partitions.
>>>>>>>>>>>> I feel like Repartitioned parameter is something that is missing
>>>> in
>>>>>>>>>>>> current DSL design.
>>>>>>>>>>>> Essentially giving user control over parallelism by configuring
>>>> num
>>>>>> of
>>>>>>>>>>>> partitions for internal topics.
>>>>>>>>>>>> 
>>>>>>>>>>>> Hope this answers your question.
>>>>>>>>>>>> 
>>>>>>>>>>>> Regards,
>>>>>>>>>>>> Levani
>>>>>>>>>>>> 
>>>>>>>>>>>>> On Jul 17, 2019, at 9:02 PM, Sophie Blee-Goldman <
>>>>>> [email protected]>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Hey Levani,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thanks for the KIP! Can you clarify one thing for me -- for the
>>>>>>>>>>>>> KStream#repartition signature taking a Repartitioned, will the
>>>>>> topic be
>>>>>>>>>>>>> auto-created by Streams (which seems to be the case for the
>>>>>> signature
>>>>>>>>>>>>> without a Repartitioned) or does it have to be pre-created? The
>>>>>> wording
>>>>>>>>>>>> in
>>>>>>>>>>>>> the KIP makes it seem like one version of the method will
>>>>>> auto-create
>>>>>>>>>>>>> topics while the other will not.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>> Sophie
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Wed, Jul 17, 2019 at 10:15 AM Levani Kokhreidze <
>>>>>>>>>>>> [email protected]>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> One more bump about KIP-221 (
>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>> 
>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
>>>>>>>>>>>>>> <
>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>> 
>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
>>>>>>>>>>>>> )
>>>>>>>>>>>>>> so it doesn’t get lost in mailing list :)
>>>>>>>>>>>>>> Would love to hear communities opinions/concerns about this KIP.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>> Levani
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Jul 12, 2019, at 5:27 PM, Levani Kokhreidze <
>>>>>> [email protected]
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Kind reminder about this KIP:
>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>> 
>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
>>>>>>>>>>>>>> <
>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>> 
>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>> Levani
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On Jul 9, 2019, at 11:38 AM, Levani Kokhreidze <
>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>>> <mailto:[email protected]>> wrote:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> In order to move this KIP forward, I’ve updated confluence
>>>> page
>>>>>> with
>>>>>>>>>>>>>> the new proposal
>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>> 
>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
>>>>>>>>>>>>>> <
>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>> 
>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> I’ve also filled “Rejected Alternatives” section.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Looking forward to discuss this KIP :)
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> King regards,
>>>>>>>>>>>>>>>> Levani
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On Jul 3, 2019, at 1:08 PM, Levani Kokhreidze <
>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>>> <mailto:[email protected]>> wrote:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Hello Matthias,
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Thanks for the feedback and ideas.
>>>>>>>>>>>>>>>>> I like the idea of introducing dedicated `Topic` class for
>>>>>> topic
>>>>>>>>>>>>>> configuration for internal operators like `groupedBy`.
>>>>>>>>>>>>>>>>> Would be great to hear others opinion about this as well.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Kind regards,
>>>>>>>>>>>>>>>>> Levani
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> On Jul 3, 2019, at 7:00 AM, Matthias J. Sax <
>>>>>> [email protected]
>>>>>>>>>>>>>> <mailto:[email protected]>> wrote:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Levani,
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Thanks for picking up this KIP! And thanks for summarizing
>>>>>>>>>>>> everything.
>>>>>>>>>>>>>>>>>> Even if some points may have been discussed already (can't
>>>>>> really
>>>>>>>>>>>>>>>>>> remember), it's helpful to get a good summary to refresh the
>>>>>>>>>>>>>> discussion.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> I think your reasoning makes sense. With regard to the
>>>>>> distinction
>>>>>>>>>>>>>>>>>> between operators that manage topics and operators that use
>>>>>>>>>>>>>> user-created
>>>>>>>>>>>>>>>>>> topics: Following this argument, it might indicate that
>>>>>> leaving
>>>>>>>>>>>>>>>>>> `through()` as-is (as an operator that uses use-defined
>>>>>> topics) and
>>>>>>>>>>>>>>>>>> introducing a new `repartition()` operator (an operator that
>>>>>> manages
>>>>>>>>>>>>>>>>>> topics itself) might be good. Otherwise, there is one
>>>> operator
>>>>>>>>>>>>>>>>>> `through()` that sometimes manages topics but sometimes
>>>> not; a
>>>>>>>>>>>>>> different
>>>>>>>>>>>>>>>>>> name, ie, new operator would make the distinction clearer.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> About adding `numOfPartitions` to `Grouped`. I am wondering
>>>>>> if the
>>>>>>>>>>>>>> same
>>>>>>>>>>>>>>>>>> argument as for `Produced` does apply and adding it is
>>>>>> semantically
>>>>>>>>>>>>>>>>>> questionable? Might be good to get opinions of others on
>>>>>> this, too.
>>>>>>>>>>>> I
>>>>>>>>>>>>>> am
>>>>>>>>>>>>>>>>>> not sure myself what solution I prefer atm.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> So far, KS uses configuration objects that allow to
>>>> configure
>>>>>> a
>>>>>>>>>>>>>> certain
>>>>>>>>>>>>>>>>>> "entity" like a consumer, producer, store. If we assume that
>>>>>> a topic
>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>> a similar entity, I am wonder if we should have a
>>>>>>>>>>>>>>>>>> `Topic#withNumberOfPartitions()` class and method instead of
>>>>>> a plain
>>>>>>>>>>>>>>>>>> integer? This would allow us to add other configs, like
>>>>>> replication
>>>>>>>>>>>>>>>>>> factor, retention-time etc, easily, without the need to
>>>>>> change the
>>>>>>>>>>>>>> "main
>>>>>>>>>>>>>>>>>> API".
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Just want to give some ideas. Not sure if I like them
>>>> myself.
>>>>>> :)
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> On 7/1/19 1:04 AM, Levani Kokhreidze wrote:
>>>>>>>>>>>>>>>>>>> Actually, giving it more though - maybe enhancing Produced
>>>>>> with num
>>>>>>>>>>>>>> of partitions configuration is not the best approach. Let me
>>>>>> explain
>>>>>>>>>>>> why:
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 1) If we enhance Produced class with this configuration,
>>>>>> this will
>>>>>>>>>>>>>> also affect KStream#to operation. Since KStream#to is the final
>>>>>> sink of
>>>>>>>>>>>> the
>>>>>>>>>>>>>> topology, for me, it seems to be reasonable assumption that user
>>>>>> needs
>>>>>>>>>>>> to
>>>>>>>>>>>>>> manually create sink topic in advance. And in that case, having
>>>>>> num of
>>>>>>>>>>>>>> partitions configuration doesn’t make much sense.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 2) Looking at Produced class, based on API contract, seems
>>>>>> like
>>>>>>>>>>>>>> Produced is designed to be something that is explicitly for
>>>>>> producer
>>>>>>>>>>>> (key
>>>>>>>>>>>>>> serializer, value serializer, partitioner those all are producer
>>>>>>>>>>>> specific
>>>>>>>>>>>>>> configurations) and num of partitions is topic level
>>>>>> configuration. And
>>>>>>>>>>>> I
>>>>>>>>>>>>>> don’t think mixing topic and producer level configurations
>>>>>> together in
>>>>>>>>>>>> one
>>>>>>>>>>>>>> class is the good approach.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 3) Looking at KStream interface, seems like Produced
>>>>>> parameter is
>>>>>>>>>>>>>> for operations that work with non-internal (e.g topics created
>>>> and
>>>>>>>>>>>> managed
>>>>>>>>>>>>>> internally by Kafka Streams) topics and I think we should leave
>>>>>> it as
>>>>>>>>>>>> it is
>>>>>>>>>>>>>> in that case.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Taking all this things into account, I think we should
>>>>>> distinguish
>>>>>>>>>>>>>> between DSL operations, where Kafka Streams should create and
>>>>>> manage
>>>>>>>>>>>>>> internal topics (KStream#groupBy) vs topics that should be
>>>>>> created in
>>>>>>>>>>>>>> advance (e.g KStream#to).
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> To sum it up, I think adding numPartitions configuration in
>>>>>>>>>>>> Produced
>>>>>>>>>>>>>> will result in mixing topic and producer level configuration in
>>>>>> one
>>>>>>>>>>>> class
>>>>>>>>>>>>>> and it’s gonna break existing API semantics.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Regarding making topic name optional in KStream#through - I
>>>>>> think
>>>>>>>>>>>>>> underline idea is very useful and giving users possibility to
>>>>>> specify
>>>>>>>>>>>> num
>>>>>>>>>>>>>> of partitions there is even more useful :) Considering arguments
>>>>>> against
>>>>>>>>>>>>>> adding num of partitions in Produced class, I see two options
>>>>>> here:
>>>>>>>>>>>>>>>>>>> 1) Add following method overloads
>>>>>>>>>>>>>>>>>>> * through() - topic will be auto-generated and num of
>>>>>> partitions
>>>>>>>>>>>>>> will be taken from source topic
>>>>>>>>>>>>>>>>>>> * through(final int numOfPartitions) - topic will be auto
>>>>>>>>>>>>>> generated with specified num of partitions
>>>>>>>>>>>>>>>>>>> * through(final int numOfPartitions, final Produced<K, V>
>>>>>>>>>>>>>> produced) - topic will be with generated with specified num of
>>>>>>>>>>>> partitions
>>>>>>>>>>>>>> and configuration taken from produced parameter.
>>>>>>>>>>>>>>>>>>> 2) Leave KStream#through as it is and introduce new method
>>>> -
>>>>>>>>>>>>>> KStream#repartition (I think Matthias suggested this in one of
>>>> the
>>>>>>>>>>>> threads)
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Considering all mentioned above I propose the following
>>>> plan:
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Option A:
>>>>>>>>>>>>>>>>>>> 1) Leave Produced as it is
>>>>>>>>>>>>>>>>>>> 2) Add num of partitions configuration to Grouped class (as
>>>>>>>>>>>>>> mentioned in the KIP)
>>>>>>>>>>>>>>>>>>> 3) Add following method overloads to KStream#through
>>>>>>>>>>>>>>>>>>> * through() - topic will be auto-generated and num of
>>>>>> partitions
>>>>>>>>>>>>>> will be taken from source topic
>>>>>>>>>>>>>>>>>>> * through(final int numOfPartitions) - topic will be auto
>>>>>>>>>>>>>> generated with specified num of partitions
>>>>>>>>>>>>>>>>>>> * through(final int numOfPartitions, final Produced<K, V>
>>>>>>>>>>>>>> produced) - topic will be with generated with specified num of
>>>>>>>>>>>> partitions
>>>>>>>>>>>>>> and configuration taken from produced parameter.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Option B:
>>>>>>>>>>>>>>>>>>> 1) Leave Produced as it is
>>>>>>>>>>>>>>>>>>> 2) Add num of partitions configuration to Grouped class (as
>>>>>>>>>>>>>> mentioned in the KIP)
>>>>>>>>>>>>>>>>>>> 3) Add new operator KStream#repartition for creating and
>>>>>> managing
>>>>>>>>>>>>>> internal repartition topics
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> P.S. I’m sorry if all of this was already discussed in the
>>>>>> mailing
>>>>>>>>>>>>>> list, but I kinda got with all the threads that were about this
>>>>>> KIP :(
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Kind regards,
>>>>>>>>>>>>>>>>>>> Levani
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> On Jul 1, 2019, at 9:56 AM, Levani Kokhreidze <
>>>>>>>>>>>>>> [email protected] <mailto:[email protected]>> wrote:
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> I would like to resurrect discussion around KIP-221. Going
>>>>>> through
>>>>>>>>>>>>>> the discussion thread, there’s seems to agreement around
>>>>>> usefulness of
>>>>>>>>>>>> this
>>>>>>>>>>>>>> feature.
>>>>>>>>>>>>>>>>>>>> Regarding the implementation, as far as I understood, the
>>>>>> most
>>>>>>>>>>>>>> optimal solution for me seems the following:
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 1) Add two method overloads to KStream#through method
>>>>>> (essentially
>>>>>>>>>>>>>> making topic name optional)
>>>>>>>>>>>>>>>>>>>> 2) Enhance Produced class with numOfPartitions
>>>> configuration
>>>>>>>>>>>> field.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Those two changes will allow DSL users to control
>>>>>> parallelism and
>>>>>>>>>>>>>> trigger re-partition without doing stateful operations.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> I will update KIP with interface changes around
>>>>>> KStream#through if
>>>>>>>>>>>>>> this changes sound sensible.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Kind regards,
>>>>>>>>>>>>>>>>>>>> Levani
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>>> 
>>>> 
>>>> 
>> 
>> 
>

Re: [DISCUSS] KIP-221: Repartition Topic Hints in Streams

Reply via email to