Jeyhun, thanks for updating the KIP.
I am wondering if you intend to add a new class `Produced`? There is already `org.apache.kafka.streams.kstream.Produced`. So if we want to add a new class, it must have a different name -- or we might be able to merge both into one? Also, for the KStream overlaods of `through()` and `to()`, can you add the different behavior using different overloads? It's not clear from the KIP what the semantics are. -Matthias On 11/17/17 3:27 PM, Jeyhun Karimov wrote: > Hi, > > Thanks for your comments. I agree with Matthias partially. > I think we should relax some requirements related with to() and through() > methods. > IMHO, Produced class can cover (existing/to be created) topic information, > and which will ease our effort: > > KStream.to(Produced topicInfo) > KStream.through(Produced topicInfo) > > This will decrease the number of overloads but we will need to deprecate > the existing to() and through() methods, perhaps. > I updated the KIP accordingly. > > > Cheers, > Jeyhun > > On Thu, Nov 16, 2017 at 10:21 PM Matthias J. Sax <matth...@confluent.io> > wrote: > >> @Jan: >> >> The `Produced` class was introduced in 1.0 to specify key and valud >> Serdes (and partitioner) if data is written into a topic. >> >> Old API: >> >> KStream#to("topic", keySerde, valueSerde); >> >> New API: >> >> KStream#to("topic", Produced.with(keySerde, valueSerde)); >> >> >> This allows to reduce the number of overloads for `to()` (and >> `through()` that follows the same pattern) -- the second parameter is >> used to cover all different variations of option parameters users can >> specify, while we only have 2 overload for `to()` itself. >> >> What is still unclear to me it, what you mean by this topic prefix >> thing? Either a user cares about the topic name and thus, must create >> and manage it manually. Or the user does not care, and Streams create >> it. How would this prefix idea fit in here? >> >> >> >> @Guozhang: >> >> My idea was to extend `Produced` with the hint we want to give for >> creating internal topic and pass a optional `Produced` parameter. There >> are multiple things we can do here: >> >> 1) stream.through(null, Produced...).groupBy().aggregate() >> -> just allow for `null` topic name indicating that Streams should >> create an internal topic >> >> 2) stream.through(Produced...).groupBy().aggregate() >> -> add one overload taking an mandatory `Produced` >> >> We use `Serialized` to picky back the information >> >> 3) stream.groupBy(Serialized...).aggregate() >> and stream.groupByKey(Serialized...).aggregate() >> -> we don't need new top level overloads >> >> >> There are different trade-offs for those alternatives and maybe there >> are other ways to change the API. It's just to push the discussion further. >> >> >> -Matthias >> >> On 11/12/17 1:22 PM, Jan Filipiak wrote: >>> Hi Gouzhang, >>> >>> this felt like these questions are supposed to be answered by me. >>> I do not understand the first one. I don't understand why the user >>> shouldn't be able to specify a suffix for the topic name. >>> >>> For the third question I am not 100% familiar if the Produced class >>> came to existence >>> at all. I remember proposing it somewhere in our redo DSL discussion that >>> I dropped out of later. Finally any call that does: >>> >>> 1. create the internal topic >>> 2. register sink >>> 3. register source >>> >>> will always get the work done. If we have a Produced like class. putting >>> all the parameters >>> in there make sense. (Partitioner, serde, PartitionHint, internal, name >>> ... ) >>> >>> Hope this helps? >>> >>> >>> On 10.11.2017 07:54, Guozhang Wang wrote: >>>> A few clarification questions on the proposal details. >>>> >>>> 1. API: although the repartition only happens at the final stateful >>>> operations like agg / join, the repartition flag info was actually >> passed >>>> from an earlier operator like map / groupBy. So what should be the new >>>> API >>>> look like? For example, if we do >>>> >>>> stream.groupBy().through("topic-name", Produced..).aggregate >>>> >>>> This would be add a bunch of APIs to GroupedKStream/KTable >>>> >>>> 2. Semantics: as Matthias mentioned, today any topics defined in >>>> "through()" call is considered a user topic, and hence users are >>>> responsible for managing them, including the topic name. For this KIP's >>>> purpose, though, users would not care about the topic name. I.e. as a >>>> user >>>> I still want to make it be an internal topic so that I do not need to >>>> worry >>>> about it at all, but only specify num.partitions. >>>> >>>> 3. Details: in Produced we do not have specs for specifying the >>>> num.partitions or should we repartition or not. So it is still not >>>> clear to >>>> me how we would make use of that to achieve what's in the old >>>> proposal's RepartitionHint class. >>>> >>>> >>>> >>>> Guozhang >>>> >>>> >>>> On Mon, Nov 6, 2017 at 1:21 PM, Ted Yu <yuzhih...@gmail.com> wrote: >>>> >>>>> bq. enlarge the score of through() >>>>> >>>>> I guess you meant scope. >>>>> >>>>> On Mon, Nov 6, 2017 at 1:15 PM, Jeyhun Karimov <je.kari...@gmail.com> >>>>> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> Sorry for the late reply. I am convinced that we should enlarge the >>>>>> score >>>>>> of through() (add more overloads) instead of introducing a separate >> set >>>>> of >>>>>> overloads to other methods. >>>>>> I will update the KIP soon based on the discussion and inform. >>>>>> >>>>>> >>>>>> Cheers, >>>>>> Jeyhun >>>>>> >>>>>> On Mon, Nov 6, 2017 at 9:18 PM Jan Filipiak <jan.filip...@trivago.com >>> >>>>>> wrote: >>>>>> >>>>>>> Sorry for not beeing 100% up to date. >>>>>>> Back then we had the discussion that when an operation puts a >Sink< >>>>>>> into the topology, a >Produced< >>>>>>> parameter is added. This produced parameter could have internal or >>>>>>> external. If internal I think the name would still make >>>>>>> a great suffix for the topic name >>>>>>> >>>>>>> Is this plan still around? Otherwise having the name as suffix is >>>>>>> probably always good it can help the user quicker to identify hot >>>>> topics >>>>>>> that need more >>>>>>> partitions if he has many of these internal repartitions >>>>>>> >>>>>>> Best Jan >>>>>>> >>>>>>> >>>>>>> On 06.11.2017 20:13, Matthias J. Sax wrote: >>>>>>>> I absolute agree with what you say. It's not a requirement to >>>>> specify a >>>>>>>> topic name -- and this was the idea -- if user does specify a name, >>>>> we >>>>>>>> treat as is -- if users does not specify a name, Streams create an >>>>>>>> internal topic. >>>>>>>> >>>>>>>> The goal of the Jira is to allow a simplified way to control >>>>>>>> repartitioning (atm, user needs to manually create a topic and use >>>>> via >>>>>>>> through()). >>>>>>>> >>>>>>>> Thus, the idea is to make the topic name parameter of through >>>>> optional. >>>>>>>> It's of course just an idea. Happy do have a other API design. The >>>>> goal >>>>>>>> was, to avoid to many new overloads. >>>>>>>> >>>>>>>>>> Could you clarify exactly what you mean by keeping the current >>>>>>> distinction? >>>>>>>> Current distinction is: user topics are created manually and user >>>>>>>> specifies the name -- internal topics are created by Kafka Streams >>>>> and >>>>>>>> an name is generated automatically. >>>>>>>> >>>>>>>> -> through("user-topic") >>>>>>>> -> through(TopicConfig.withNumberOfPartitions(5)) // Streams creates >>>>>> an >>>>>>>> internal topic >>>>>>>> >>>>>>>> >>>>>>>> -Matthias >>>>>>>> >>>>>>>> >>>>>>>> On 11/6/17 6:56 PM, Thomas Becker wrote: >>>>>>>>> Could you clarify exactly what you mean by keeping the current >>>>>>> distinction? >>>>>>>>> Actually, re-reading the KIP and JIRA, it's not clear that being >>>>> able >>>>>>> to specify a custom name is actually a requirement. If the goal is to >>>>>>> control repartitioning and tune parallelism, maybe we can just >>>>>>> sidestep >>>>>>> this issue altogether by removing the ability to set a different >> name. >>>>>>>>> On Mon, 2017-11-06 at 16:51 +0100, Matthias J. Sax wrote: >>>>>>>>> >>>>>>>>> That's a good point. In current design, we strictly distinguish >>>>> both. >>>>>>>>> For example, the reset tools deletes internal topics (starting with >>>>>>>>> prefix `<application.id>-` and ending with either `-repartition` >> or >>>>>>>>> `-changelog`. >>>>>>>>> >>>>>>>>> Thus, from my point of view, it would make sense to keep the >> current >>>>>>>>> distinction. >>>>>>>>> >>>>>>>>> -Matthias >>>>>>>>> >>>>>>>>> On 11/6/17 4:45 PM, Thomas Becker wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> I think this sounds good as well. It's worth clarifying whether >>>>> topics >>>>>>> that are named by the user but created by streams are considered >>>>>> "internal" >>>>>>> topics also. >>>>>>>>> On Sun, 2017-11-05 at 23:02 +0100, Matthias J. Sax wrote: >>>>>>>>> >>>>>>>>> My idea was, to relax the requirement for through() that a topic >>>>> must >>>>>> be >>>>>>>>> created manually before startup. >>>>>>>>> >>>>>>>>> Thus, if no through() call is made, a (internal) topic is created >>>>> the >>>>>>>>> same way we do it currently. >>>>>>>>> >>>>>>>>> If one uses `through(String topicName)` we keep the current >> behavior >>>>>> and >>>>>>>>> require users to create the topic manually. >>>>>>>>> >>>>>>>>> The reasoning is as follows: if a user creates a topic manually, a >>>>>> user >>>>>>>>> can just use it for repartitioning. As the topic is already there, >>>>>> there >>>>>>>>> is no need to specify any topic configs. >>>>>>>>> >>>>>>>>> We add a new `through()` overload (details TBD) that allows to >>>>> specify >>>>>>>>> topic configs and Streams create the topic with those configs. >>>>>>>>> >>>>>>>>> Reasoning: user don't want to manage topic manually, thus, it's >>>>> still >>>>>> an >>>>>>>>> internal topic and Streams create the topic name automatically as >>>>> for >>>>>>>>> all other internal topics. However, users gets some more control >>>>> about >>>>>>>>> topic parameters like number of partitions (we should discuss what >>>>>> other >>>>>>>>> configs would be useful). >>>>>>>>> >>>>>>>>> >>>>>>>>> Does this make sense? >>>>>>>>> >>>>>>>>> >>>>>>>>> -Matthias >>>>>>>>> >>>>>>>>> >>>>>>>>> On 11/5/17 1:21 AM, Jan Filipiak wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> Hi. >>>>>>>>> >>>>>>>>> >>>>>>>>> Im not 100 % up to date what version 1.0 DSL looks like ATM. >>>>>>>>> I just would argue that repartitioning should be an own API call >>>>> like >>>>>>>>> through or something. >>>>>>>>> One can use through or to already to get this. I would argue one >>>>>> should >>>>>>>>> look there instead of overloads >>>>>>>>> >>>>>>>>> Best Jan >>>>>>>>> >>>>>>>>> On 04.11.2017 16:01, Jeyhun Karimov wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> Dear community, >>>>>>>>> >>>>>>>>> I would like to initiate discussion on KIP-221 [1] based on issue >>>>> [2]. >>>>>>>>> Please feel free to comment. >>>>>>>>> >>>>>>>>> [1] >>>>>>>>> >>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP- >>>>>> 221%3A+Repartition+Topic+Hints+in+Streams >>>>>>>>> [2] https://issues.apache.org/jira/browse/KAFKA-6037 >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Cheers, >>>>>>>>> Jeyhun >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> ________________________________ >>>>>>>>> >>>>>>>>> This email and any attachments may contain confidential and >>>>> privileged >>>>>>> material for the sole use of the intended recipient. Any review, >>>>> copying, >>>>>>> or distribution of this email (or any attachments) by others is >>>>>> prohibited. >>>>>>> If you are not the intended recipient, please contact the sender >>>>>>> immediately and permanently delete this email and any attachments. No >>>>>>> employee or agent of TiVo Inc. is authorized to conclude any binding >>>>>>> agreement on behalf of TiVo Inc. by email. Binding agreements with >>>>>>> TiVo >>>>>>> Inc. may only be made by a signed written agreement. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> ________________________________ >>>>>>>>> >>>>>>>>> This email and any attachments may contain confidential and >>>>> privileged >>>>>>> material for the sole use of the intended recipient. Any review, >>>>> copying, >>>>>>> or distribution of this email (or any attachments) by others is >>>>>> prohibited. >>>>>>> If you are not the intended recipient, please contact the sender >>>>>>> immediately and permanently delete this email and any attachments. No >>>>>>> employee or agent of TiVo Inc. is authorized to conclude any binding >>>>>>> agreement on behalf of TiVo Inc. by email. Binding agreements with >>>>>>> TiVo >>>>>>> Inc. may only be made by a signed written agreement. >>>>>>> >>>> >>>> >>> >> >> >
signature.asc
Description: OpenPGP digital signature