Re: [DISCUSS] KIP-221: Repartition Topic Hints in Streams

Levani Kokhreidze Thu, 07 Nov 2019 11:00:09 -0800

Hi Sophie,

Thank you for your reply, very insightful. Looking forward hearing others 
opinion as well on this.


Kind regards,
Levani


> On Nov 6, 2019, at 1:30 AM, Sophie Blee-Goldman <sop...@confluent.io> wrote:
> 
>> Personally, I think Matthias’s concern is valid, but on the other hand
> Kafka Streams has already
>> optimizer in place which alters topology independently from user
> 
> I agree (with you) and think this is a good way to put it -- we currently
> auto-repartition for the user so
> that they don't have to walk through their entire topology and reason about
> when and where to place a
> `.through` (or the new `.repartition`), so why suddenly force this onto the
> user? How certain are we that
> users will always get this right? It's easy to imagine that during
> development, you write your new app with
> correctly placed repartitions in order to use this new feature. During the
> course of development you end up
> tweaking the topology, but don't remember to review or move the
> repartitioning since you're used to Streams
> doing this for you. If you use only single-partition topics for testing,
> you might not even notice your app is
> spitting out incorrect results!
> 
> Anyways, I feel pretty strongly that it would be weird to introduce a new
> feature and say that to use it, you can't take
> advantage of this other feature anymore. Also, is it possible our
> optimization framework could ever include an
> optimized repartitioning strategy that is better than what a user could
> achieve by manually inserting repartitions?
> Do we expect users to have a deep understanding of the best way to
> repartition their particular topology, or is it
> likely they will end up over-repartitioning either due to missed
> optimizations or unnecessary extra repartitions?
> I think many users would prefer to just say "if there *is* a repartition
> required at this point in the topology, it should
> have N partitions"
> 
> As to the idea of adding `numberOfPartitions` to Grouped rather than
> adding a new parameter to groupBy, that does seem more in line with the
> current syntax so +1 from me
> 
> On Tue, Nov 5, 2019 at 2:07 PM Levani Kokhreidze <levani.co...@gmail.com>
> wrote:
> 
>> Hello all,
>> 
>> While https://github.com/apache/kafka/pull/7170 <
>> https://github.com/apache/kafka/pull/7170> is under review and it’s
>> almost done, I want to resurrect discussion about this KIP to address
>> couple of concerns raised by Matthias and John.
>> 
>> As a reminder, idea of the KIP-221 was to allow DSL users control over
>> repartitioning and parallelism of sub-topologies by:
>> 1) Introducing new KStream#repartition operation which is done in
>> https://github.com/apache/kafka/pull/7170 <
>> https://github.com/apache/kafka/pull/7170>
>> 2) Add new KStream#groupBy(Repartitioned) operation, which is planned to
>> be separate PR.
>> 
>> While all agree about general implementation and idea behind
>> https://github.com/apache/kafka/pull/7170 <
>> https://github.com/apache/kafka/pull/7170> PR, introducing new
>> KStream#groupBy(Repartitioned) method overload raised some questions during
>> the review.
>> Matthias raised concern that there can be cases when user uses
>> `KStream#groupBy(Repartitioned)` operation, but actual repartitioning may
>> not required, thus configuration passed via `Repartitioned` would never be
>> applied (Matthias, please correct me if I misinterpreted your comment).
>> So instead, if user wants to control parallelism of sub-topologies, he or
>> she should always use `KStream#repartition` operation before groupBy. Full
>> comment can be seen here:
>> https://github.com/apache/kafka/pull/7170#issuecomment-519303125 <
>> https://github.com/apache/kafka/pull/7170#issuecomment-519303125>
>> 
>> On the same topic, John pointed out that, from API design perspective, we
>> shouldn’t intertwine configuration classes of different operators between
>> one another. So instead of introducing new `KStream#groupBy(Repartitioned)`
>> for specifying number of partitions for internal topic, we should update
>> existing `Grouped` class with `numberOfPartitions` field.
>> 
>> Personally, I think Matthias’s concern is valid, but on the other hand
>> Kafka Streams has already optimizer in place which alters topology
>> independently from user. So maybe it makes sense if Kafka Streams,
>> internally would optimize topology in the best way possible, even if in
>> some cases this means ignoring some operator configurations passed by the
>> user. Also, I agree with John about API design semantics. If we go through
>> with the changes for `KStream#groupBy` operation, it makes more sense to
>> add `numberOfPartitions` field to `Grouped` class instead of introducing
>> new `KStream#groupBy(Repartitioned)` method overload.
>> 
>> I would really appreciate communities feedback on this.
>> 
>> Kind regards,
>> Levani
>> 
>> 
>> 
>>> On Oct 17, 2019, at 12:57 AM, Sophie Blee-Goldman <sop...@confluent.io>
>> wrote:
>>> 
>>> Hey Levani,
>>> 
>>> I think people are busy with the upcoming 2.4 release, and don't have
>> much
>>> spare time at the
>>> moment. It's kind of a difficult time to get attention on things, but
>> feel
>>> free to pick up something else
>>> to work on in the meantime until things have calmed down a bit!
>>> 
>>> Cheers,
>>> Sophie
>>> 
>>> 
>>> On Wed, Oct 16, 2019 at 11:26 AM Levani Kokhreidze <
>> levani.co...@gmail.com <mailto:levani.co...@gmail.com>>
>>> wrote:
>>> 
>>>> Hello all,
>>>> 
>>>> Sorry for bringing this thread again, but I would like to get some
>>>> attention on this PR: https://github.com/apache/kafka/pull/7170 <
>> https://github.com/apache/kafka/pull/7170> <
>>>> https://github.com/apache/kafka/pull/7170 <
>> https://github.com/apache/kafka/pull/7170>>
>>>> It's been a while now and I would love to move on to other KIPs as well.
>>>> Please let me know if you have any concerns.
>>>> 
>>>> Regards,
>>>> Levani
>>>> 
>>>> 
>>>>> On Jul 26, 2019, at 11:25 AM, Levani Kokhreidze <
>> levani.co...@gmail.com <mailto:levani.co...@gmail.com>>
>>>> wrote:
>>>>> 
>>>>> Hi all,
>>>>> 
>>>>> Here’s voting thread for this KIP:
>>>> https://www.mail-archive.com/dev@kafka.apache.org/msg99680.html <
>> https://www.mail-archive.com/dev@kafka.apache.org/msg99680.html> <
>>>> https://www.mail-archive.com/dev@kafka.apache.org/msg99680.html <
>> https://www.mail-archive.com/dev@kafka.apache.org/msg99680.html>>
>>>>> 
>>>>> Regards,
>>>>> Levani
>>>>> 
>>>>>> On Jul 24, 2019, at 11:15 PM, Levani Kokhreidze <
>> levani.co...@gmail.com <mailto:levani.co...@gmail.com>
>>>> <mailto:levani.co...@gmail.com <mailto:levani.co...@gmail.com>>> wrote:
>>>>>> 
>>>>>> Hi Matthias,
>>>>>> 
>>>>>> Thanks for the suggestion. I Don’t have strong opinion on that one.
>>>>>> Agree that avoiding unnecessary method overloads is a good idea.
>>>>>> 
>>>>>> Updated KIP
>>>>>> 
>>>>>> Regards,
>>>>>> Levani
>>>>>> 
>>>>>> 
>>>>>>> On Jul 24, 2019, at 8:50 PM, Matthias J. Sax <matth...@confluent.io
>> <mailto:matth...@confluent.io>
>>>> <mailto:matth...@confluent.io <mailto:matth...@confluent.io>>> wrote:
>>>>>>> 
>>>>>>> One question:
>>>>>>> 
>>>>>>> Why do we add
>>>>>>> 
>>>>>>>> Repartitioned#with(final String name, final int numberOfPartitions)
>>>>>>> 
>>>>>>> It seems that `#with(String name)`, `#numberOfPartitions(int)` in
>>>>>>> combination with `withName()` and `withNumberOfPartitions()` should
>> be
>>>>>>> sufficient. Users can chain the method calls.
>>>>>>> 
>>>>>>> (I think it's valuable to keep the number of overload small if
>>>> possible.)
>>>>>>> 
>>>>>>> Otherwise LGTM.
>>>>>>> 
>>>>>>> 
>>>>>>> -Matthias
>>>>>>> 
>>>>>>> 
>>>>>>> On 7/23/19 2:18 PM, Levani Kokhreidze wrote:
>>>>>>>> Hello,
>>>>>>>> 
>>>>>>>> Thanks all for your feedback.
>>>>>>>> I started voting procedure for this KIP. If there’re any other
>>>> concerns about this KIP, please let me know.
>>>>>>>> 
>>>>>>>> Regards,
>>>>>>>> Levani
>>>>>>>> 
>>>>>>>>> On Jul 20, 2019, at 8:39 PM, Levani Kokhreidze <
>>>> levani.co...@gmail.com <mailto:levani.co...@gmail.com> <mailto:
>> levani.co...@gmail.com <mailto:levani.co...@gmail.com>>> wrote:
>>>>>>>>> 
>>>>>>>>> Hi Matthias,
>>>>>>>>> 
>>>>>>>>> Thanks for the suggestion, makes sense.
>>>>>>>>> I’ve updated KIP (
>>>> 
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
>> <
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
>>> 
>>>> <
>>>> 
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
>> <
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
>>>> 
>>>> <
>>>> 
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
>> <
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
>>> 
>>>> <
>>>> 
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
>> <
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
>>> 
>>>>>> ).
>>>>>>>>> 
>>>>>>>>> Regards,
>>>>>>>>> Levani
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> On Jul 20, 2019, at 3:53 AM, Matthias J. Sax <
>> matth...@confluent.io <mailto:matth...@confluent.io>
>>>> <mailto:matth...@confluent.io <mailto:matth...@confluent.io>> <mailto:
>> matth...@confluent.io <mailto:matth...@confluent.io> <mailto:
>>>> matth...@confluent.io <mailto:matth...@confluent.io>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>> Thanks for driving the KIP.
>>>>>>>>>> 
>>>>>>>>>> I agree that users need to be able to specify a partitioning
>>>> strategy.
>>>>>>>>>> 
>>>>>>>>>> Sophie raises a fair point about topic configs and producer
>>>> configs. My
>>>>>>>>>> take is, that consider `Repartitioned` as an "extension" to
>>>> `Produced`,
>>>>>>>>>> that adds topic configuration, is a good way to think about it and
>>>> helps
>>>>>>>>>> to keep the API "clean".
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> With regard to method names. I would prefer to avoid
>> abbreviations.
>>>> Can
>>>>>>>>>> we rename:
>>>>>>>>>> 
>>>>>>>>>> `withNumOfPartitions` -> `withNumberOfPartitions`
>>>>>>>>>> 
>>>>>>>>>> Furthermore, it might be good to add some more `static` methods:
>>>>>>>>>> 
>>>>>>>>>> - Repartitioned.with(Serde<K>, Serde<V>)
>>>>>>>>>> - Repartitioned.withNumberOfPartitions(int)
>>>>>>>>>> - Repartitioned.streamPartitioner(StreamPartitioner)
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> -Matthias
>>>>>>>>>> 
>>>>>>>>>> On 7/19/19 3:33 PM, Levani Kokhreidze wrote:
>>>>>>>>>>> Totally agree. I think in KStream interface it makes sense to
>> have
>>>> some duplicate configurations between operators in order to keep API
>> simple
>>>> and usable.
>>>>>>>>>>> Also, as more surface API has, harder it is to have proper
>>>> backward compatibility.
>>>>>>>>>>> While initial idea of keeping topic level configs separate was
>>>> exciting, having Repartitioned class encapsulate some producer level
>>>> configs makes API more readable.
>>>>>>>>>>> 
>>>>>>>>>>> Regards,
>>>>>>>>>>> Levani
>>>>>>>>>>> 
>>>>>>>>>>>> On Jul 20, 2019, at 1:15 AM, Sophie Blee-Goldman <
>>>> sop...@confluent.io <mailto:sop...@confluent.io> <mailto:
>> sop...@confluent.io <mailto:sop...@confluent.io>> <mailto:
>>>> sop...@confluent.io <mailto:sop...@confluent.io> <mailto:
>> sop...@confluent.io <mailto:sop...@confluent.io>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> I think that is a good point about trying to keep producer level
>>>>>>>>>>>> configurations and (repartition) topic level considerations
>>>> separate.
>>>>>>>>>>>> Number of partitions is definitely purely a topic level
>>>> configuration. But
>>>>>>>>>>>> on some level, serdes and partitioners are just as much a topic
>>>>>>>>>>>> configuration as a producer one. You could have two producers
>>>> configured
>>>>>>>>>>>> with different serdes and/or partitioners, but if they are
>>>> writing to the
>>>>>>>>>>>> same topic the result would be very difficult to part. So in a
>>>> sense, these
>>>>>>>>>>>> are configurations of topics in Streams, not just producers.
>>>>>>>>>>>> 
>>>>>>>>>>>> Another way to think of it: while the Streams API is not always
>>>> true to
>>>>>>>>>>>> this, ideally all the relevant configs for an operator are
>>>> wrapped into a
>>>>>>>>>>>> single object (in this case, Repartitioned). We could instead
>>>> split out the
>>>>>>>>>>>> fields in common with Produced into a separate parameter to keep
>>>> topic and
>>>>>>>>>>>> producer level configurations separate, but this increases the
>>>> API surface
>>>>>>>>>>>> area by a lot. It's much more straightforward to just say "this
>> is
>>>>>>>>>>>> everything that this particular operator needs" without worrying
>>>> about what
>>>>>>>>>>>> exactly you're specifying.
>>>>>>>>>>>> 
>>>>>>>>>>>> I suppose you could alternatively make Produced a field of
>>>> Repartitioned,
>>>>>>>>>>>> but I don't think we do this kind of composition elsewhere in
>>>> Streams at
>>>>>>>>>>>> the moment
>>>>>>>>>>>> 
>>>>>>>>>>>> On Fri, Jul 19, 2019 at 1:45 PM Levani Kokhreidze <
>>>> levani.co...@gmail.com <mailto:levani.co...@gmail.com> <mailto:
>> levani.co...@gmail.com <mailto:levani.co...@gmail.com>> <mailto:
>>>> levani.co...@gmail.com <mailto:levani.co...@gmail.com> <mailto:
>> levani.co...@gmail.com <mailto:levani.co...@gmail.com>>>>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> Hi Bill,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thanks a lot for the feedback.
>>>>>>>>>>>>> Yes, that makes sense. I’ve updated KIP with
>>>> `Repartitioned#partitioner`
>>>>>>>>>>>>> configuration.
>>>>>>>>>>>>> In the beginning, I wanted to introduce a class for topic level
>>>>>>>>>>>>> configuration and keep topic level and producer level
>>>> configurations (such
>>>>>>>>>>>>> as Produced) separately (see my second email in this thread).
>>>>>>>>>>>>> But while looking at the semantics of KStream interface, I
>>>> couldn’t really
>>>>>>>>>>>>> figure out good operation name for Topic level configuration
>>>> class and just
>>>>>>>>>>>>> introducing `Topic` config class was kinda breaking the
>>>> semantics.
>>>>>>>>>>>>> So I think having Repartitioned class which encapsulates topic
>>>> and
>>>>>>>>>>>>> producer level configurations for internal topics is viable
>>>> thing to do.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>> Levani
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Jul 19, 2019, at 7:47 PM, Bill Bejeck <bbej...@gmail.com
>> <mailto:bbej...@gmail.com>
>>>> <mailto:bbej...@gmail.com <mailto:bbej...@gmail.com>> <mailto:
>> bbej...@gmail.com <mailto:bbej...@gmail.com> <mailto:
>>>> bbej...@gmail.com <mailto:bbej...@gmail.com>>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Hi Lavani,
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Thanks for resurrecting this KIP.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I'm also a +1 for adding a partition option.  In addition to
>>>> the reason
>>>>>>>>>>>>>> provided by John, my reasoning is:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 1. Users may want to use something other than hash-based
>>>> partitioning
>>>>>>>>>>>>>> 2. Users may wish to partition on something different than the
>>>> key
>>>>>>>>>>>>>> without having to change the key.  For example:
>>>>>>>>>>>>>> 1. A combination of fields in the value in conjunction with
>>>> the key
>>>>>>>>>>>>>> 2. Something other than the key
>>>>>>>>>>>>>> 3. We allow users to specify a partitioner on Produced hence
>> in
>>>>>>>>>>>>>> KStream.to and KStream.through, so it makes sense for API
>>>> consistency.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Just my  2 cents.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> Bill
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Fri, Jul 19, 2019 at 5:46 AM Levani Kokhreidze <
>>>>>>>>>>>>> levani.co...@gmail.com <mailto:levani.co...@gmail.com>
>> <mailto:levani.co...@gmail.com <mailto:levani.co...@gmail.com>> <mailto:
>>>> levani.co...@gmail.com <mailto:levani.co...@gmail.com> <mailto:
>> levani.co...@gmail.com <mailto:levani.co...@gmail.com>>>>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Hi John,
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> In my mind it makes sense.
>>>>>>>>>>>>>>> If we add partitioner configuration to Repartitioned class,
>>>> with the
>>>>>>>>>>>>>>> combination of specifying number of partitions for internal
>>>> topics, user
>>>>>>>>>>>>>>> will have opportunity to ensure co-partitioning before join
>>>> operation.
>>>>>>>>>>>>>>> I think this can be quite powerful feature.
>>>>>>>>>>>>>>> Wondering what others think about this?
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>> Levani
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On Jul 18, 2019, at 1:20 AM, John Roesler <
>> j...@confluent.io <mailto:j...@confluent.io>
>>>> <mailto:j...@confluent.io <mailto:j...@confluent.io>> <mailto:
>> j...@confluent.io <mailto:j...@confluent.io> <mailto:
>>>> j...@confluent.io <mailto:j...@confluent.io>>>> wrote:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Yes, I believe that's what I had in mind. Again, not totally
>>>> sure it
>>>>>>>>>>>>>>>> makes sense, but I believe something similar is the
>> rationale
>>>> for
>>>>>>>>>>>>>>>> having the partitioner option in Produced.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>> -John
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On Wed, Jul 17, 2019 at 3:20 PM Levani Kokhreidze
>>>>>>>>>>>>>>>> <levani.co...@gmail.com <mailto:levani.co...@gmail.com>
>> <mailto:levani.co...@gmail.com <mailto:levani.co...@gmail.com>>
>>>> <mailto:levani.co...@gmail.com <mailto:levani.co...@gmail.com> <mailto:
>> levani.co...@gmail.com <mailto:levani.co...@gmail.com>>>> wrote:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Hey John,
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Oh that’s interesting use-case.
>>>>>>>>>>>>>>>>> Do I understand this correctly, in your example I would
>>>> first issue
>>>>>>>>>>>>>>> repartition(Repartitioned) with proper partitioner that
>>>> essentially
>>>>>>>>>>>>> would
>>>>>>>>>>>>>>> be the same as the topic I want to join with and then do the
>>>>>>>>>>>>> KStream#join
>>>>>>>>>>>>>>> with DSL?
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>>>> Levani
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> On Jul 17, 2019, at 11:11 PM, John Roesler <
>>>> j...@confluent.io <mailto:j...@confluent.io> <mailto:j...@confluent.io
>> <mailto:j...@confluent.io>> <mailto:j...@confluent.io <mailto:
>> j...@confluent.io>
>>>> <mailto:j...@confluent.io <mailto:j...@confluent.io>>>>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Hey, all, just to chime in,
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> I think it might be useful to have an option to specify
>> the
>>>>>>>>>>>>>>>>>> partitioner. The case I have in mind is that some data may
>>>> get
>>>>>>>>>>>>>>>>>> repartitioned and then joined with an input topic. If the
>>>> right-side
>>>>>>>>>>>>>>>>>> input topic uses a custom partitioning strategy, then the
>>>>>>>>>>>>>>>>>> repartitioned stream also needs to be partitioned with the
>>>> same
>>>>>>>>>>>>>>>>>> strategy.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Does that make sense, or did I maybe miss something
>>>> important?
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>> -John
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> On Wed, Jul 17, 2019 at 2:48 PM Levani Kokhreidze
>>>>>>>>>>>>>>>>>> <levani.co...@gmail.com <mailto:levani.co...@gmail.com>
>> <mailto:levani.co...@gmail.com <mailto:levani.co...@gmail.com>>
>>>> <mailto:levani.co...@gmail.com <mailto:levani.co...@gmail.com> <mailto:
>> levani.co...@gmail.com <mailto:levani.co...@gmail.com>>>> wrote:
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Yes, I was thinking about it as well. To be honest I’m
>> not
>>>> sure
>>>>>>>>>>>>> about
>>>>>>>>>>>>>>> it yet.
>>>>>>>>>>>>>>>>>>> As Kafka Streams DSL user, I don’t really think I would
>>>> need control
>>>>>>>>>>>>>>> over partitioner for internal topics.
>>>>>>>>>>>>>>>>>>> As a user, I would assume that Kafka Streams knows best
>>>> how to
>>>>>>>>>>>>>>> partition data for internal topics.
>>>>>>>>>>>>>>>>>>> In this KIP I wrote that Produced should be used only for
>>>> topics
>>>>>>>>>>>>> that
>>>>>>>>>>>>>>> are created by user In advance.
>>>>>>>>>>>>>>>>>>> In those cases maybe it make sense to have possibility to
>>>> specify
>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> partitioner.
>>>>>>>>>>>>>>>>>>> I don’t have clear answer on that yet, but I guess
>>>> specifying the
>>>>>>>>>>>>>>> partitioner can be added as well if there’s agreement on
>> this.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>>>>>> Levani
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> On Jul 17, 2019, at 10:42 PM, Sophie Blee-Goldman <
>>>>>>>>>>>>>>> sop...@confluent.io <mailto:sop...@confluent.io> <mailto:
>> sop...@confluent.io <mailto:sop...@confluent.io>> <mailto:
>>>> sop...@confluent.io <mailto:sop...@confluent.io>>> wrote:
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Thanks for clearing that up. I agree that Repartitioned
>>>> would be a
>>>>>>>>>>>>>>> useful
>>>>>>>>>>>>>>>>>>>> addition. I'm wondering if it might also need to have
>>>>>>>>>>>>>>>>>>>> a withStreamPartitioner method/field, similar to
>>>> Produced? I'm not
>>>>>>>>>>>>>>> sure how
>>>>>>>>>>>>>>>>>>>> widely this feature is really used, but seems it should
>> be
>>>>>>>>>>>>> available
>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>> repartition topics.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> On Wed, Jul 17, 2019 at 11:26 AM Levani Kokhreidze <
>>>>>>>>>>>>>>> levani.co...@gmail.com <mailto:levani.co...@gmail.com>
>>>> <mailto:levani.co...@gmail.com <mailto:levani.co...@gmail.com> <mailto:
>> levani.co...@gmail.com <mailto:levani.co...@gmail.com>>>>
>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Hey Sophie,
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> In both cases KStream#repartition and
>>>>>>>>>>>>>>> KStream#repartition(Repartitioned)
>>>>>>>>>>>>>>>>>>>>> topic will be created and managed by Kafka Streams.
>>>>>>>>>>>>>>>>>>>>> Idea of Repartitioned is to give user more control over
>>>> the topic
>>>>>>>>>>>>>>> such as
>>>>>>>>>>>>>>>>>>>>> num of partitions.
>>>>>>>>>>>>>>>>>>>>> I feel like Repartitioned parameter is something that
>> is
>>>> missing
>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>>> current DSL design.
>>>>>>>>>>>>>>>>>>>>> Essentially giving user control over parallelism by
>>>> configuring
>>>>>>>>>>>>> num
>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>>> partitions for internal topics.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Hope this answers your question.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>>>>>>>> Levani
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> On Jul 17, 2019, at 9:02 PM, Sophie Blee-Goldman <
>>>>>>>>>>>>>>> sop...@confluent.io <mailto:sop...@confluent.io> <mailto:
>> sop...@confluent.io <mailto:sop...@confluent.io>> <mailto:
>>>> sop...@confluent.io <mailto:sop...@confluent.io>>>
>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> Hey Levani,
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> Thanks for the KIP! Can you clarify one thing for me
>> --
>>>> for the
>>>>>>>>>>>>>>>>>>>>>> KStream#repartition signature taking a Repartitioned,
>>>> will the
>>>>>>>>>>>>>>> topic be
>>>>>>>>>>>>>>>>>>>>>> auto-created by Streams (which seems to be the case
>> for
>>>> the
>>>>>>>>>>>>>>> signature
>>>>>>>>>>>>>>>>>>>>>> without a Repartitioned) or does it have to be
>>>> pre-created? The
>>>>>>>>>>>>>>> wording
>>>>>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>>>> the KIP makes it seem like one version of the method
>>>> will
>>>>>>>>>>>>>>> auto-create
>>>>>>>>>>>>>>>>>>>>>> topics while the other will not.
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>>>>>>> Sophie
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> On Wed, Jul 17, 2019 at 10:15 AM Levani Kokhreidze <
>>>>>>>>>>>>>>>>>>>>> levani.co...@gmail.com <mailto:levani.co...@gmail.com>
>>>> <mailto:levani.co...@gmail.com <mailto:levani.co...@gmail.com> <mailto:
>> levani.co...@gmail.com <mailto:levani.co...@gmail.com>>>>
>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> One more bump about KIP-221 (
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>> 
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
>> <
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
>>> 
>>>> <
>>>> 
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
>> <
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
>>>> 
>>>> <
>>>> 
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
>> <
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
>>> 
>>>> <
>>>> 
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
>> <
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
>>> 
>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> <
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>> 
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
>> <
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
>>> 
>>>> <
>>>> 
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
>>>>> 
>>>>>>>>>>>>>>>>>>>>>> )
>>>>>>>>>>>>>>>>>>>>>>> so it doesn’t get lost in mailing list :)
>>>>>>>>>>>>>>>>>>>>>>> Would love to hear communities opinions/concerns
>> about
>>>> this KIP.
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>>>>>>>>>> Levani
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> On Jul 12, 2019, at 5:27 PM, Levani Kokhreidze <
>>>>>>>>>>>>>>> levani.co...@gmail.com
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> Kind reminder about this KIP:
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>> 
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
>>>>>>>>>>>>>>>>>>>>>>> <
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>> 
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>>>>>>>>>>> Levani
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> On Jul 9, 2019, at 11:38 AM, Levani Kokhreidze <
>>>>>>>>>>>>>>>>>>>>> levani.co...@gmail.com
>>>>>>>>>>>>>>>>>>>>>>> <mailto:levani.co...@gmail.com>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> In order to move this KIP forward, I’ve updated
>>>> confluence
>>>>>>>>>>>>> page
>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>>>>>>> the new proposal
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>> 
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
>>>>>>>>>>>>>>>>>>>>>>> <
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>> 
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> I’ve also filled “Rejected Alternatives” section.
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> Looking forward to discuss this KIP :)
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> King regards,
>>>>>>>>>>>>>>>>>>>>>>>>> Levani
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> On Jul 3, 2019, at 1:08 PM, Levani Kokhreidze <
>>>>>>>>>>>>>>>>>>>>> levani.co...@gmail.com
>>>>>>>>>>>>>>>>>>>>>>> <mailto:levani.co...@gmail.com>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> Hello Matthias,
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for the feedback and ideas.
>>>>>>>>>>>>>>>>>>>>>>>>>> I like the idea of introducing dedicated `Topic`
>>>> class for
>>>>>>>>>>>>>>> topic
>>>>>>>>>>>>>>>>>>>>>>> configuration for internal operators like
>> `groupedBy`.
>>>>>>>>>>>>>>>>>>>>>>>>>> Would be great to hear others opinion about this
>> as
>>>> well.
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> Kind regards,
>>>>>>>>>>>>>>>>>>>>>>>>>> Levani
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> On Jul 3, 2019, at 7:00 AM, Matthias J. Sax <
>>>>>>>>>>>>>>> matth...@confluent.io
>>>>>>>>>>>>>>>>>>>>>>> <mailto:matth...@confluent.io>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> Levani,
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for picking up this KIP! And thanks for
>>>> summarizing
>>>>>>>>>>>>>>>>>>>>> everything.
>>>>>>>>>>>>>>>>>>>>>>>>>>> Even if some points may have been discussed
>>>> already (can't
>>>>>>>>>>>>>>> really
>>>>>>>>>>>>>>>>>>>>>>>>>>> remember), it's helpful to get a good summary to
>>>> refresh the
>>>>>>>>>>>>>>>>>>>>>>> discussion.
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> I think your reasoning makes sense. With regard
>> to
>>>> the
>>>>>>>>>>>>>>> distinction
>>>>>>>>>>>>>>>>>>>>>>>>>>> between operators that manage topics and
>> operators
>>>> that use
>>>>>>>>>>>>>>>>>>>>>>> user-created
>>>>>>>>>>>>>>>>>>>>>>>>>>> topics: Following this argument, it might
>> indicate
>>>> that
>>>>>>>>>>>>>>> leaving
>>>>>>>>>>>>>>>>>>>>>>>>>>> `through()` as-is (as an operator that uses
>>>> use-defined
>>>>>>>>>>>>>>> topics) and
>>>>>>>>>>>>>>>>>>>>>>>>>>> introducing a new `repartition()` operator (an
>>>> operator that
>>>>>>>>>>>>>>> manages
>>>>>>>>>>>>>>>>>>>>>>>>>>> topics itself) might be good. Otherwise, there is
>>>> one
>>>>>>>>>>>>> operator
>>>>>>>>>>>>>>>>>>>>>>>>>>> `through()` that sometimes manages topics but
>>>> sometimes
>>>>>>>>>>>>> not; a
>>>>>>>>>>>>>>>>>>>>>>> different
>>>>>>>>>>>>>>>>>>>>>>>>>>> name, ie, new operator would make the distinction
>>>> clearer.
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> About adding `numOfPartitions` to `Grouped`. I am
>>>> wondering
>>>>>>>>>>>>>>> if the
>>>>>>>>>>>>>>>>>>>>>>> same
>>>>>>>>>>>>>>>>>>>>>>>>>>> argument as for `Produced` does apply and adding
>>>> it is
>>>>>>>>>>>>>>> semantically
>>>>>>>>>>>>>>>>>>>>>>>>>>> questionable? Might be good to get opinions of
>>>> others on
>>>>>>>>>>>>>>> this, too.
>>>>>>>>>>>>>>>>>>>>> I
>>>>>>>>>>>>>>>>>>>>>>> am
>>>>>>>>>>>>>>>>>>>>>>>>>>> not sure myself what solution I prefer atm.
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> So far, KS uses configuration objects that allow
>> to
>>>>>>>>>>>>> configure
>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>>>>>> certain
>>>>>>>>>>>>>>>>>>>>>>>>>>> "entity" like a consumer, producer, store. If we
>>>> assume that
>>>>>>>>>>>>>>> a topic
>>>>>>>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>>>>>>>>> a similar entity, I am wonder if we should have a
>>>>>>>>>>>>>>>>>>>>>>>>>>> `Topic#withNumberOfPartitions()` class and method
>>>> instead of
>>>>>>>>>>>>>>> a plain
>>>>>>>>>>>>>>>>>>>>>>>>>>> integer? This would allow us to add other
>> configs,
>>>> like
>>>>>>>>>>>>>>> replication
>>>>>>>>>>>>>>>>>>>>>>>>>>> factor, retention-time etc, easily, without the
>>>> need to
>>>>>>>>>>>>>>> change the
>>>>>>>>>>>>>>>>>>>>>>> "main
>>>>>>>>>>>>>>>>>>>>>>>>>>> API".
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> Just want to give some ideas. Not sure if I like
>>>> them
>>>>>>>>>>>>> myself.
>>>>>>>>>>>>>>> :)
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> On 7/1/19 1:04 AM, Levani Kokhreidze wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Actually, giving it more though - maybe
>> enhancing
>>>> Produced
>>>>>>>>>>>>>>> with num
>>>>>>>>>>>>>>>>>>>>>>> of partitions configuration is not the best approach.
>>>> Let me
>>>>>>>>>>>>>>> explain
>>>>>>>>>>>>>>>>>>>>> why:
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1) If we enhance Produced class with this
>>>> configuration,
>>>>>>>>>>>>>>> this will
>>>>>>>>>>>>>>>>>>>>>>> also affect KStream#to operation. Since KStream#to is
>>>> the final
>>>>>>>>>>>>>>> sink of
>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>> topology, for me, it seems to be reasonable
>> assumption
>>>> that user
>>>>>>>>>>>>>>> needs
>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>> manually create sink topic in advance. And in that
>>>> case, having
>>>>>>>>>>>>>>> num of
>>>>>>>>>>>>>>>>>>>>>>> partitions configuration doesn’t make much sense.
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2) Looking at Produced class, based on API
>>>> contract, seems
>>>>>>>>>>>>>>> like
>>>>>>>>>>>>>>>>>>>>>>> Produced is designed to be something that is
>>>> explicitly for
>>>>>>>>>>>>>>> producer
>>>>>>>>>>>>>>>>>>>>> (key
>>>>>>>>>>>>>>>>>>>>>>> serializer, value serializer, partitioner those all
>>>> are producer
>>>>>>>>>>>>>>>>>>>>> specific
>>>>>>>>>>>>>>>>>>>>>>> configurations) and num of partitions is topic level
>>>>>>>>>>>>>>> configuration. And
>>>>>>>>>>>>>>>>>>>>> I
>>>>>>>>>>>>>>>>>>>>>>> don’t think mixing topic and producer level
>>>> configurations
>>>>>>>>>>>>>>> together in
>>>>>>>>>>>>>>>>>>>>> one
>>>>>>>>>>>>>>>>>>>>>>> class is the good approach.
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 3) Looking at KStream interface, seems like
>>>> Produced
>>>>>>>>>>>>>>> parameter is
>>>>>>>>>>>>>>>>>>>>>>> for operations that work with non-internal (e.g
>> topics
>>>> created
>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>> managed
>>>>>>>>>>>>>>>>>>>>>>> internally by Kafka Streams) topics and I think we
>>>> should leave
>>>>>>>>>>>>>>> it as
>>>>>>>>>>>>>>>>>>>>> it is
>>>>>>>>>>>>>>>>>>>>>>> in that case.
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Taking all this things into account, I think we
>>>> should
>>>>>>>>>>>>>>> distinguish
>>>>>>>>>>>>>>>>>>>>>>> between DSL operations, where Kafka Streams should
>>>> create and
>>>>>>>>>>>>>>> manage
>>>>>>>>>>>>>>>>>>>>>>> internal topics (KStream#groupBy) vs topics that
>>>> should be
>>>>>>>>>>>>>>> created in
>>>>>>>>>>>>>>>>>>>>>>> advance (e.g KStream#to).
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> To sum it up, I think adding numPartitions
>>>> configuration in
>>>>>>>>>>>>>>>>>>>>> Produced
>>>>>>>>>>>>>>>>>>>>>>> will result in mixing topic and producer level
>>>> configuration in
>>>>>>>>>>>>>>> one
>>>>>>>>>>>>>>>>>>>>> class
>>>>>>>>>>>>>>>>>>>>>>> and it’s gonna break existing API semantics.
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Regarding making topic name optional in
>>>> KStream#through - I
>>>>>>>>>>>>>>> think
>>>>>>>>>>>>>>>>>>>>>>> underline idea is very useful and giving users
>>>> possibility to
>>>>>>>>>>>>>>> specify
>>>>>>>>>>>>>>>>>>>>> num
>>>>>>>>>>>>>>>>>>>>>>> of partitions there is even more useful :)
>> Considering
>>>> arguments
>>>>>>>>>>>>>>> against
>>>>>>>>>>>>>>>>>>>>>>> adding num of partitions in Produced class, I see two
>>>> options
>>>>>>>>>>>>>>> here:
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1) Add following method overloads
>>>>>>>>>>>>>>>>>>>>>>>>>>>> * through() - topic will be auto-generated and
>>>> num of
>>>>>>>>>>>>>>> partitions
>>>>>>>>>>>>>>>>>>>>>>> will be taken from source topic
>>>>>>>>>>>>>>>>>>>>>>>>>>>> * through(final int numOfPartitions) - topic
>> will
>>>> be auto
>>>>>>>>>>>>>>>>>>>>>>> generated with specified num of partitions
>>>>>>>>>>>>>>>>>>>>>>>>>>>> * through(final int numOfPartitions, final
>>>> Produced<K, V>
>>>>>>>>>>>>>>>>>>>>>>> produced) - topic will be with generated with
>>>> specified num of
>>>>>>>>>>>>>>>>>>>>> partitions
>>>>>>>>>>>>>>>>>>>>>>> and configuration taken from produced parameter.
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2) Leave KStream#through as it is and introduce
>>>> new method
>>>>>>>>>>>>> -
>>>>>>>>>>>>>>>>>>>>>>> KStream#repartition (I think Matthias suggested this
>>>> in one of
>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>> threads)
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Considering all mentioned above I propose the
>>>> following
>>>>>>>>>>>>> plan:
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Option A:
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1) Leave Produced as it is
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2) Add num of partitions configuration to
>> Grouped
>>>> class (as
>>>>>>>>>>>>>>>>>>>>>>> mentioned in the KIP)
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 3) Add following method overloads to
>>>> KStream#through
>>>>>>>>>>>>>>>>>>>>>>>>>>>> * through() - topic will be auto-generated and
>>>> num of
>>>>>>>>>>>>>>> partitions
>>>>>>>>>>>>>>>>>>>>>>> will be taken from source topic
>>>>>>>>>>>>>>>>>>>>>>>>>>>> * through(final int numOfPartitions) - topic
>> will
>>>> be auto
>>>>>>>>>>>>>>>>>>>>>>> generated with specified num of partitions
>>>>>>>>>>>>>>>>>>>>>>>>>>>> * through(final int numOfPartitions, final
>>>> Produced<K, V>
>>>>>>>>>>>>>>>>>>>>>>> produced) - topic will be with generated with
>>>> specified num of
>>>>>>>>>>>>>>>>>>>>> partitions
>>>>>>>>>>>>>>>>>>>>>>> and configuration taken from produced parameter.
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Option B:
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1) Leave Produced as it is
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2) Add num of partitions configuration to
>> Grouped
>>>> class (as
>>>>>>>>>>>>>>>>>>>>>>> mentioned in the KIP)
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 3) Add new operator KStream#repartition for
>>>> creating and
>>>>>>>>>>>>>>> managing
>>>>>>>>>>>>>>>>>>>>>>> internal repartition topics
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> P.S. I’m sorry if all of this was already
>>>> discussed in the
>>>>>>>>>>>>>>> mailing
>>>>>>>>>>>>>>>>>>>>>>> list, but I kinda got with all the threads that were
>>>> about this
>>>>>>>>>>>>>>> KIP :(
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Kind regards,
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Levani
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Jul 1, 2019, at 9:56 AM, Levani Kokhreidze <
>>>>>>>>>>>>>>>>>>>>>>> levani.co...@gmail.com <mailto:
>> levani.co...@gmail.com>>
>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I would like to resurrect discussion around
>>>> KIP-221. Going
>>>>>>>>>>>>>>> through
>>>>>>>>>>>>>>>>>>>>>>> the discussion thread, there’s seems to agreement
>>>> around
>>>>>>>>>>>>>>> usefulness of
>>>>>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>>>>>> feature.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Regarding the implementation, as far as I
>>>> understood, the
>>>>>>>>>>>>>>> most
>>>>>>>>>>>>>>>>>>>>>>> optimal solution for me seems the following:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1) Add two method overloads to KStream#through
>>>> method
>>>>>>>>>>>>>>> (essentially
>>>>>>>>>>>>>>>>>>>>>>> making topic name optional)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2) Enhance Produced class with numOfPartitions
>>>>>>>>>>>>> configuration
>>>>>>>>>>>>>>>>>>>>> field.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Those two changes will allow DSL users to
>> control
>>>>>>>>>>>>>>> parallelism and
>>>>>>>>>>>>>>>>>>>>>>> trigger re-partition without doing stateful
>> operations.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I will update KIP with interface changes around
>>>>>>>>>>>>>>> KStream#through if
>>>>>>>>>>>>>>>>>>>>>>> this changes sound sensible.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Kind regards,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Levani
>> 
>>

Re: [DISCUSS] KIP-221: Repartition Topic Hints in Streams

Reply via email to