It seems Jeyhun (cc'ed) is not working on the KIP any longer. If there is no response within a week from Jeyhun, feel free to take over the KIP.
One more side comment: we recently accepted KIP-372, that overlaps with this KIP. Thus, if you resume KIP-221, please consider the changes of KIP-372. Thanks a lot! -Matthias On 9/21/18 11:27 AM, Lei Chen wrote: > Hi, > > Just want to know is anyone actively working on this and also KAFKA-4835 > <https://issues.apache.org/jira/browse/KAFKA-4835>? Seems like the JIRA has > been inactive for couple months. We want this feature and would like to > move it forward if no one else is working on it. > > Lei > > On Wed, Jun 20, 2018 at 7:27 PM Matthias J. Sax <matth...@confluent.io> > wrote: > >> No worries. It's just good to know. It seems that some other people are >> interested to drive this further. So we will just "reassign" it to them. >> >> Thanks for letting us know. >> >> >> -Matthias >> >> On 6/20/18 2:51 PM, Jeyhun Karimov wrote: >>> Hi Matthias, all, >>> >>> Currently, I am not able to complete this KIP. Please accept my >>> apologies for that. >>> >>> >>> Cheers, >>> Jeyhun >>> >>> On Mon, Jun 11, 2018 at 2:25 AM Matthias J. Sax <matth...@confluent.io >>> <mailto:matth...@confluent.io>> wrote: >>> >>> What is the status of this KIP? >>> >>> -Matthias >>> >>> >>> On 2/13/18 1:43 PM, Matthias J. Sax wrote: >>> > Is there any update for this KIP? >>> > >>> > >>> > -Matthias >>> > >>> > On 12/4/17 2:08 PM, Matthias J. Sax wrote: >>> >> Jeyhun, >>> >> >>> >> thanks for updating the KIP. >>> >> >>> >> I am wondering if you intend to add a new class `Produced`? There >> is >>> >> already `org.apache.kafka.streams.kstream.Produced`. So if we >> want to >>> >> add a new class, it must have a different name -- or we might be >>> able to >>> >> merge both into one? >>> >> >>> >> Also, for the KStream overlaods of `through()` and `to()`, can >>> you add >>> >> the different behavior using different overloads? It's not clear >> from >>> >> the KIP what the semantics are. >>> >> >>> >> >>> >> -Matthias >>> >> >>> >> On 11/17/17 3:27 PM, Jeyhun Karimov wrote: >>> >>> Hi, >>> >>> >>> >>> Thanks for your comments. I agree with Matthias partially. >>> >>> I think we should relax some requirements related with to() and >>> through() >>> >>> methods. >>> >>> IMHO, Produced class can cover (existing/to be created) topic >>> information, >>> >>> and which will ease our effort: >>> >>> >>> >>> KStream.to(Produced topicInfo) >>> >>> KStream.through(Produced topicInfo) >>> >>> >>> >>> This will decrease the number of overloads but we will need to >>> deprecate >>> >>> the existing to() and through() methods, perhaps. >>> >>> I updated the KIP accordingly. >>> >>> >>> >>> >>> >>> Cheers, >>> >>> Jeyhun >>> >>> >>> >>> On Thu, Nov 16, 2017 at 10:21 PM Matthias J. Sax >>> <matth...@confluent.io <mailto:matth...@confluent.io>> >>> >>> wrote: >>> >>> >>> >>>> @Jan: >>> >>>> >>> >>>> The `Produced` class was introduced in 1.0 to specify key and >> valud >>> >>>> Serdes (and partitioner) if data is written into a topic. >>> >>>> >>> >>>> Old API: >>> >>>> >>> >>>> KStream#to("topic", keySerde, valueSerde); >>> >>>> >>> >>>> New API: >>> >>>> >>> >>>> KStream#to("topic", Produced.with(keySerde, valueSerde)); >>> >>>> >>> >>>> >>> >>>> This allows to reduce the number of overloads for `to()` (and >>> >>>> `through()` that follows the same pattern) -- the second >>> parameter is >>> >>>> used to cover all different variations of option parameters >>> users can >>> >>>> specify, while we only have 2 overload for `to()` itself. >>> >>>> >>> >>>> What is still unclear to me it, what you mean by this topic >> prefix >>> >>>> thing? Either a user cares about the topic name and thus, must >>> create >>> >>>> and manage it manually. Or the user does not care, and Streams >>> create >>> >>>> it. How would this prefix idea fit in here? >>> >>>> >>> >>>> >>> >>>> >>> >>>> @Guozhang: >>> >>>> >>> >>>> My idea was to extend `Produced` with the hint we want to give >> for >>> >>>> creating internal topic and pass a optional `Produced` >>> parameter. There >>> >>>> are multiple things we can do here: >>> >>>> >>> >>>> 1) stream.through(null, Produced...).groupBy().aggregate() >>> >>>> -> just allow for `null` topic name indicating that Streams >> should >>> >>>> create an internal topic >>> >>>> >>> >>>> 2) stream.through(Produced...).groupBy().aggregate() >>> >>>> -> add one overload taking an mandatory `Produced` >>> >>>> >>> >>>> We use `Serialized` to picky back the information >>> >>>> >>> >>>> 3) stream.groupBy(Serialized...).aggregate() >>> >>>> and stream.groupByKey(Serialized...).aggregate() >>> >>>> -> we don't need new top level overloads >>> >>>> >>> >>>> >>> >>>> There are different trade-offs for those alternatives and maybe >>> there >>> >>>> are other ways to change the API. It's just to push the >>> discussion further. >>> >>>> >>> >>>> >>> >>>> -Matthias >>> >>>> >>> >>>> On 11/12/17 1:22 PM, Jan Filipiak wrote: >>> >>>>> Hi Gouzhang, >>> >>>>> >>> >>>>> this felt like these questions are supposed to be answered by >> me. >>> >>>>> I do not understand the first one. I don't understand why the >> user >>> >>>>> shouldn't be able to specify a suffix for the topic name. >>> >>>>> >>> >>>>> For the third question I am not 100% familiar if the Produced >>> class >>> >>>>> came to existence >>> >>>>> at all. I remember proposing it somewhere in our redo DSL >>> discussion that >>> >>>>> I dropped out of later. Finally any call that does: >>> >>>>> >>> >>>>> 1. create the internal topic >>> >>>>> 2. register sink >>> >>>>> 3. register source >>> >>>>> >>> >>>>> will always get the work done. If we have a Produced like >>> class. putting >>> >>>>> all the parameters >>> >>>>> in there make sense. (Partitioner, serde, PartitionHint, >>> internal, name >>> >>>>> ... ) >>> >>>>> >>> >>>>> Hope this helps? >>> >>>>> >>> >>>>> >>> >>>>> On 10.11.2017 07:54, Guozhang Wang wrote: >>> >>>>>> A few clarification questions on the proposal details. >>> >>>>>> >>> >>>>>> 1. API: although the repartition only happens at the final >>> stateful >>> >>>>>> operations like agg / join, the repartition flag info was >>> actually >>> >>>> passed >>> >>>>>> from an earlier operator like map / groupBy. So what should >>> be the new >>> >>>>>> API >>> >>>>>> look like? For example, if we do >>> >>>>>> >>> >>>>>> stream.groupBy().through("topic-name", Produced..).aggregate >>> >>>>>> >>> >>>>>> This would be add a bunch of APIs to GroupedKStream/KTable >>> >>>>>> >>> >>>>>> 2. Semantics: as Matthias mentioned, today any topics defined >> in >>> >>>>>> "through()" call is considered a user topic, and hence users >> are >>> >>>>>> responsible for managing them, including the topic name. For >>> this KIP's >>> >>>>>> purpose, though, users would not care about the topic name. >>> I.e. as a >>> >>>>>> user >>> >>>>>> I still want to make it be an internal topic so that I do not >>> need to >>> >>>>>> worry >>> >>>>>> about it at all, but only specify num.partitions. >>> >>>>>> >>> >>>>>> 3. Details: in Produced we do not have specs for specifying >> the >>> >>>>>> num.partitions or should we repartition or not. So it is >>> still not >>> >>>>>> clear to >>> >>>>>> me how we would make use of that to achieve what's in the old >>> >>>>>> proposal's RepartitionHint class. >>> >>>>>> >>> >>>>>> >>> >>>>>> >>> >>>>>> Guozhang >>> >>>>>> >>> >>>>>> >>> >>>>>> On Mon, Nov 6, 2017 at 1:21 PM, Ted Yu <yuzhih...@gmail.com >>> <mailto:yuzhih...@gmail.com>> wrote: >>> >>>>>> >>> >>>>>>> bq. enlarge the score of through() >>> >>>>>>> >>> >>>>>>> I guess you meant scope. >>> >>>>>>> >>> >>>>>>> On Mon, Nov 6, 2017 at 1:15 PM, Jeyhun Karimov >>> <je.kari...@gmail.com <mailto:je.kari...@gmail.com>> >>> >>>>>>> wrote: >>> >>>>>>> >>> >>>>>>>> Hi, >>> >>>>>>>> >>> >>>>>>>> Sorry for the late reply. I am convinced that we should >>> enlarge the >>> >>>>>>>> score >>> >>>>>>>> of through() (add more overloads) instead of introducing a >>> separate >>> >>>> set >>> >>>>>>> of >>> >>>>>>>> overloads to other methods. >>> >>>>>>>> I will update the KIP soon based on the discussion and >> inform. >>> >>>>>>>> >>> >>>>>>>> >>> >>>>>>>> Cheers, >>> >>>>>>>> Jeyhun >>> >>>>>>>> >>> >>>>>>>> On Mon, Nov 6, 2017 at 9:18 PM Jan Filipiak >>> <jan.filip...@trivago.com <mailto:jan.filip...@trivago.com> >>> >>>>> >>> >>>>>>>> wrote: >>> >>>>>>>> >>> >>>>>>>>> Sorry for not beeing 100% up to date. >>> >>>>>>>>> Back then we had the discussion that when an operation >>> puts a >Sink< >>> >>>>>>>>> into the topology, a >Produced< >>> >>>>>>>>> parameter is added. This produced parameter could have >>> internal or >>> >>>>>>>>> external. If internal I think the name would still make >>> >>>>>>>>> a great suffix for the topic name >>> >>>>>>>>> >>> >>>>>>>>> Is this plan still around? Otherwise having the name as >>> suffix is >>> >>>>>>>>> probably always good it can help the user quicker to >>> identify hot >>> >>>>>>> topics >>> >>>>>>>>> that need more >>> >>>>>>>>> partitions if he has many of these internal repartitions >>> >>>>>>>>> >>> >>>>>>>>> Best Jan >>> >>>>>>>>> >>> >>>>>>>>> >>> >>>>>>>>> On 06.11.2017 20:13, Matthias J. Sax wrote: >>> >>>>>>>>>> I absolute agree with what you say. It's not a >> requirement to >>> >>>>>>> specify a >>> >>>>>>>>>> topic name -- and this was the idea -- if user does >>> specify a name, >>> >>>>>>> we >>> >>>>>>>>>> treat as is -- if users does not specify a name, Streams >>> create an >>> >>>>>>>>>> internal topic. >>> >>>>>>>>>> >>> >>>>>>>>>> The goal of the Jira is to allow a simplified way to >> control >>> >>>>>>>>>> repartitioning (atm, user needs to manually create a >>> topic and use >>> >>>>>>> via >>> >>>>>>>>>> through()). >>> >>>>>>>>>> >>> >>>>>>>>>> Thus, the idea is to make the topic name parameter of >> through >>> >>>>>>> optional. >>> >>>>>>>>>> It's of course just an idea. Happy do have a other API >>> design. The >>> >>>>>>> goal >>> >>>>>>>>>> was, to avoid to many new overloads. >>> >>>>>>>>>> >>> >>>>>>>>>>>> Could you clarify exactly what you mean by keeping the >>> current >>> >>>>>>>>> distinction? >>> >>>>>>>>>> Current distinction is: user topics are created manually >>> and user >>> >>>>>>>>>> specifies the name -- internal topics are created by >>> Kafka Streams >>> >>>>>>> and >>> >>>>>>>>>> an name is generated automatically. >>> >>>>>>>>>> >>> >>>>>>>>>> -> through("user-topic") >>> >>>>>>>>>> -> through(TopicConfig.withNumberOfPartitions(5)) // >>> Streams creates >>> >>>>>>>> an >>> >>>>>>>>>> internal topic >>> >>>>>>>>>> >>> >>>>>>>>>> >>> >>>>>>>>>> -Matthias >>> >>>>>>>>>> >>> >>>>>>>>>> >>> >>>>>>>>>> On 11/6/17 6:56 PM, Thomas Becker wrote: >>> >>>>>>>>>>> Could you clarify exactly what you mean by keeping the >>> current >>> >>>>>>>>> distinction? >>> >>>>>>>>>>> Actually, re-reading the KIP and JIRA, it's not clear >>> that being >>> >>>>>>> able >>> >>>>>>>>> to specify a custom name is actually a requirement. If the >>> goal is to >>> >>>>>>>>> control repartitioning and tune parallelism, maybe we can >> just >>> >>>>>>>>> sidestep >>> >>>>>>>>> this issue altogether by removing the ability to set a >>> different >>> >>>> name. >>> >>>>>>>>>>> On Mon, 2017-11-06 at 16:51 +0100, Matthias J. Sax wrote: >>> >>>>>>>>>>> >>> >>>>>>>>>>> That's a good point. In current design, we strictly >>> distinguish >>> >>>>>>> both. >>> >>>>>>>>>>> For example, the reset tools deletes internal topics >>> (starting with >>> >>>>>>>>>>> prefix `<application.id <http://application.id>>-` and >>> ending with either `-repartition` >>> >>>> or >>> >>>>>>>>>>> `-changelog`. >>> >>>>>>>>>>> >>> >>>>>>>>>>> Thus, from my point of view, it would make sense to keep >> the >>> >>>> current >>> >>>>>>>>>>> distinction. >>> >>>>>>>>>>> >>> >>>>>>>>>>> -Matthias >>> >>>>>>>>>>> >>> >>>>>>>>>>> On 11/6/17 4:45 PM, Thomas Becker wrote: >>> >>>>>>>>>>> >>> >>>>>>>>>>> >>> >>>>>>>>>>> I think this sounds good as well. It's worth clarifying >>> whether >>> >>>>>>> topics >>> >>>>>>>>> that are named by the user but created by streams are >>> considered >>> >>>>>>>> "internal" >>> >>>>>>>>> topics also. >>> >>>>>>>>>>> On Sun, 2017-11-05 at 23:02 +0100, Matthias J. Sax wrote: >>> >>>>>>>>>>> >>> >>>>>>>>>>> My idea was, to relax the requirement for through() that >>> a topic >>> >>>>>>> must >>> >>>>>>>> be >>> >>>>>>>>>>> created manually before startup. >>> >>>>>>>>>>> >>> >>>>>>>>>>> Thus, if no through() call is made, a (internal) topic >>> is created >>> >>>>>>> the >>> >>>>>>>>>>> same way we do it currently. >>> >>>>>>>>>>> >>> >>>>>>>>>>> If one uses `through(String topicName)` we keep the >> current >>> >>>> behavior >>> >>>>>>>> and >>> >>>>>>>>>>> require users to create the topic manually. >>> >>>>>>>>>>> >>> >>>>>>>>>>> The reasoning is as follows: if a user creates a topic >>> manually, a >>> >>>>>>>> user >>> >>>>>>>>>>> can just use it for repartitioning. As the topic is >>> already there, >>> >>>>>>>> there >>> >>>>>>>>>>> is no need to specify any topic configs. >>> >>>>>>>>>>> >>> >>>>>>>>>>> We add a new `through()` overload (details TBD) that >>> allows to >>> >>>>>>> specify >>> >>>>>>>>>>> topic configs and Streams create the topic with those >>> configs. >>> >>>>>>>>>>> >>> >>>>>>>>>>> Reasoning: user don't want to manage topic manually, >>> thus, it's >>> >>>>>>> still >>> >>>>>>>> an >>> >>>>>>>>>>> internal topic and Streams create the topic name >>> automatically as >>> >>>>>>> for >>> >>>>>>>>>>> all other internal topics. However, users gets some more >>> control >>> >>>>>>> about >>> >>>>>>>>>>> topic parameters like number of partitions (we should >>> discuss what >>> >>>>>>>> other >>> >>>>>>>>>>> configs would be useful). >>> >>>>>>>>>>> >>> >>>>>>>>>>> >>> >>>>>>>>>>> Does this make sense? >>> >>>>>>>>>>> >>> >>>>>>>>>>> >>> >>>>>>>>>>> -Matthias >>> >>>>>>>>>>> >>> >>>>>>>>>>> >>> >>>>>>>>>>> On 11/5/17 1:21 AM, Jan Filipiak wrote: >>> >>>>>>>>>>> >>> >>>>>>>>>>> >>> >>>>>>>>>>> Hi. >>> >>>>>>>>>>> >>> >>>>>>>>>>> >>> >>>>>>>>>>> Im not 100 % up to date what version 1.0 DSL looks like >> ATM. >>> >>>>>>>>>>> I just would argue that repartitioning should be an own >>> API call >>> >>>>>>> like >>> >>>>>>>>>>> through or something. >>> >>>>>>>>>>> One can use through or to already to get this. I would >>> argue one >>> >>>>>>>> should >>> >>>>>>>>>>> look there instead of overloads >>> >>>>>>>>>>> >>> >>>>>>>>>>> Best Jan >>> >>>>>>>>>>> >>> >>>>>>>>>>> On 04.11.2017 16:01, Jeyhun Karimov wrote: >>> >>>>>>>>>>> >>> >>>>>>>>>>> >>> >>>>>>>>>>> Dear community, >>> >>>>>>>>>>> >>> >>>>>>>>>>> I would like to initiate discussion on KIP-221 [1] based >>> on issue >>> >>>>>>> [2]. >>> >>>>>>>>>>> Please feel free to comment. >>> >>>>>>>>>>> >>> >>>>>>>>>>> [1] >>> >>>>>>>>>>> >>> >>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP- >>> >>>>>>>> 221%3A+Repartition+Topic+Hints+in+Streams >>> >>>>>>>>>>> [2] https://issues.apache.org/jira/browse/KAFKA-6037 >>> >>>>>>>>>>> >>> >>>>>>>>>>> >>> >>>>>>>>>>> >>> >>>>>>>>>>> Cheers, >>> >>>>>>>>>>> Jeyhun >>> >>>>>>>>>>> >>> >>>>>>>>>>> >>> >>>>>>>>>>> >>> >>>>>>>>>>> >>> >>>>>>>>>>> >>> >>>>>>>>>>> >>> >>>>>>>>>>> >>> >>>>>>>>>>> >>> >>>>>>>>>>> >>> >>>>>>>>>>> ________________________________ >>> >>>>>>>>>>> >>> >>>>>>>>>>> This email and any attachments may contain confidential >> and >>> >>>>>>> privileged >>> >>>>>>>>> material for the sole use of the intended recipient. Any >>> review, >>> >>>>>>> copying, >>> >>>>>>>>> or distribution of this email (or any attachments) by >>> others is >>> >>>>>>>> prohibited. >>> >>>>>>>>> If you are not the intended recipient, please contact the >>> sender >>> >>>>>>>>> immediately and permanently delete this email and any >>> attachments. No >>> >>>>>>>>> employee or agent of TiVo Inc. is authorized to conclude >>> any binding >>> >>>>>>>>> agreement on behalf of TiVo Inc. by email. Binding >>> agreements with >>> >>>>>>>>> TiVo >>> >>>>>>>>> Inc. may only be made by a signed written agreement. >>> >>>>>>>>>>> >>> >>>>>>>>>>> >>> >>>>>>>>>>> >>> >>>>>>>>>>> >>> >>>>>>>>>>> >>> >>>>>>>>>>> ________________________________ >>> >>>>>>>>>>> >>> >>>>>>>>>>> This email and any attachments may contain confidential >> and >>> >>>>>>> privileged >>> >>>>>>>>> material for the sole use of the intended recipient. Any >>> review, >>> >>>>>>> copying, >>> >>>>>>>>> or distribution of this email (or any attachments) by >>> others is >>> >>>>>>>> prohibited. >>> >>>>>>>>> If you are not the intended recipient, please contact the >>> sender >>> >>>>>>>>> immediately and permanently delete this email and any >>> attachments. No >>> >>>>>>>>> employee or agent of TiVo Inc. is authorized to conclude >>> any binding >>> >>>>>>>>> agreement on behalf of TiVo Inc. by email. Binding >>> agreements with >>> >>>>>>>>> TiVo >>> >>>>>>>>> Inc. may only be made by a signed written agreement. >>> >>>>>>>>> >>> >>>>>> >>> >>>>>> >>> >>>>> >>> >>>> >>> >>>> >>> >>> >>> >> >>> > >>> >> >> >
signature.asc
Description: OpenPGP digital signature