What is the status of this KIP? -Matthias
On 2/13/18 1:43 PM, Matthias J. Sax wrote: > Is there any update for this KIP? > > > -Matthias > > On 12/4/17 2:08 PM, Matthias J. Sax wrote: >> Jeyhun, >> >> thanks for updating the KIP. >> >> I am wondering if you intend to add a new class `Produced`? There is >> already `org.apache.kafka.streams.kstream.Produced`. So if we want to >> add a new class, it must have a different name -- or we might be able to >> merge both into one? >> >> Also, for the KStream overlaods of `through()` and `to()`, can you add >> the different behavior using different overloads? It's not clear from >> the KIP what the semantics are. >> >> >> -Matthias >> >> On 11/17/17 3:27 PM, Jeyhun Karimov wrote: >>> Hi, >>> >>> Thanks for your comments. I agree with Matthias partially. >>> I think we should relax some requirements related with to() and through() >>> methods. >>> IMHO, Produced class can cover (existing/to be created) topic information, >>> and which will ease our effort: >>> >>> KStream.to(Produced topicInfo) >>> KStream.through(Produced topicInfo) >>> >>> This will decrease the number of overloads but we will need to deprecate >>> the existing to() and through() methods, perhaps. >>> I updated the KIP accordingly. >>> >>> >>> Cheers, >>> Jeyhun >>> >>> On Thu, Nov 16, 2017 at 10:21 PM Matthias J. Sax <matth...@confluent.io> >>> wrote: >>> >>>> @Jan: >>>> >>>> The `Produced` class was introduced in 1.0 to specify key and valud >>>> Serdes (and partitioner) if data is written into a topic. >>>> >>>> Old API: >>>> >>>> KStream#to("topic", keySerde, valueSerde); >>>> >>>> New API: >>>> >>>> KStream#to("topic", Produced.with(keySerde, valueSerde)); >>>> >>>> >>>> This allows to reduce the number of overloads for `to()` (and >>>> `through()` that follows the same pattern) -- the second parameter is >>>> used to cover all different variations of option parameters users can >>>> specify, while we only have 2 overload for `to()` itself. >>>> >>>> What is still unclear to me it, what you mean by this topic prefix >>>> thing? Either a user cares about the topic name and thus, must create >>>> and manage it manually. Or the user does not care, and Streams create >>>> it. How would this prefix idea fit in here? >>>> >>>> >>>> >>>> @Guozhang: >>>> >>>> My idea was to extend `Produced` with the hint we want to give for >>>> creating internal topic and pass a optional `Produced` parameter. There >>>> are multiple things we can do here: >>>> >>>> 1) stream.through(null, Produced...).groupBy().aggregate() >>>> -> just allow for `null` topic name indicating that Streams should >>>> create an internal topic >>>> >>>> 2) stream.through(Produced...).groupBy().aggregate() >>>> -> add one overload taking an mandatory `Produced` >>>> >>>> We use `Serialized` to picky back the information >>>> >>>> 3) stream.groupBy(Serialized...).aggregate() >>>> and stream.groupByKey(Serialized...).aggregate() >>>> -> we don't need new top level overloads >>>> >>>> >>>> There are different trade-offs for those alternatives and maybe there >>>> are other ways to change the API. It's just to push the discussion further. >>>> >>>> >>>> -Matthias >>>> >>>> On 11/12/17 1:22 PM, Jan Filipiak wrote: >>>>> Hi Gouzhang, >>>>> >>>>> this felt like these questions are supposed to be answered by me. >>>>> I do not understand the first one. I don't understand why the user >>>>> shouldn't be able to specify a suffix for the topic name. >>>>> >>>>> For the third question I am not 100% familiar if the Produced class >>>>> came to existence >>>>> at all. I remember proposing it somewhere in our redo DSL discussion that >>>>> I dropped out of later. Finally any call that does: >>>>> >>>>> 1. create the internal topic >>>>> 2. register sink >>>>> 3. register source >>>>> >>>>> will always get the work done. If we have a Produced like class. putting >>>>> all the parameters >>>>> in there make sense. (Partitioner, serde, PartitionHint, internal, name >>>>> ... ) >>>>> >>>>> Hope this helps? >>>>> >>>>> >>>>> On 10.11.2017 07:54, Guozhang Wang wrote: >>>>>> A few clarification questions on the proposal details. >>>>>> >>>>>> 1. API: although the repartition only happens at the final stateful >>>>>> operations like agg / join, the repartition flag info was actually >>>> passed >>>>>> from an earlier operator like map / groupBy. So what should be the new >>>>>> API >>>>>> look like? For example, if we do >>>>>> >>>>>> stream.groupBy().through("topic-name", Produced..).aggregate >>>>>> >>>>>> This would be add a bunch of APIs to GroupedKStream/KTable >>>>>> >>>>>> 2. Semantics: as Matthias mentioned, today any topics defined in >>>>>> "through()" call is considered a user topic, and hence users are >>>>>> responsible for managing them, including the topic name. For this KIP's >>>>>> purpose, though, users would not care about the topic name. I.e. as a >>>>>> user >>>>>> I still want to make it be an internal topic so that I do not need to >>>>>> worry >>>>>> about it at all, but only specify num.partitions. >>>>>> >>>>>> 3. Details: in Produced we do not have specs for specifying the >>>>>> num.partitions or should we repartition or not. So it is still not >>>>>> clear to >>>>>> me how we would make use of that to achieve what's in the old >>>>>> proposal's RepartitionHint class. >>>>>> >>>>>> >>>>>> >>>>>> Guozhang >>>>>> >>>>>> >>>>>> On Mon, Nov 6, 2017 at 1:21 PM, Ted Yu <yuzhih...@gmail.com> wrote: >>>>>> >>>>>>> bq. enlarge the score of through() >>>>>>> >>>>>>> I guess you meant scope. >>>>>>> >>>>>>> On Mon, Nov 6, 2017 at 1:15 PM, Jeyhun Karimov <je.kari...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> Sorry for the late reply. I am convinced that we should enlarge the >>>>>>>> score >>>>>>>> of through() (add more overloads) instead of introducing a separate >>>> set >>>>>>> of >>>>>>>> overloads to other methods. >>>>>>>> I will update the KIP soon based on the discussion and inform. >>>>>>>> >>>>>>>> >>>>>>>> Cheers, >>>>>>>> Jeyhun >>>>>>>> >>>>>>>> On Mon, Nov 6, 2017 at 9:18 PM Jan Filipiak <jan.filip...@trivago.com >>>>> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Sorry for not beeing 100% up to date. >>>>>>>>> Back then we had the discussion that when an operation puts a >Sink< >>>>>>>>> into the topology, a >Produced< >>>>>>>>> parameter is added. This produced parameter could have internal or >>>>>>>>> external. If internal I think the name would still make >>>>>>>>> a great suffix for the topic name >>>>>>>>> >>>>>>>>> Is this plan still around? Otherwise having the name as suffix is >>>>>>>>> probably always good it can help the user quicker to identify hot >>>>>>> topics >>>>>>>>> that need more >>>>>>>>> partitions if he has many of these internal repartitions >>>>>>>>> >>>>>>>>> Best Jan >>>>>>>>> >>>>>>>>> >>>>>>>>> On 06.11.2017 20:13, Matthias J. Sax wrote: >>>>>>>>>> I absolute agree with what you say. It's not a requirement to >>>>>>> specify a >>>>>>>>>> topic name -- and this was the idea -- if user does specify a name, >>>>>>> we >>>>>>>>>> treat as is -- if users does not specify a name, Streams create an >>>>>>>>>> internal topic. >>>>>>>>>> >>>>>>>>>> The goal of the Jira is to allow a simplified way to control >>>>>>>>>> repartitioning (atm, user needs to manually create a topic and use >>>>>>> via >>>>>>>>>> through()). >>>>>>>>>> >>>>>>>>>> Thus, the idea is to make the topic name parameter of through >>>>>>> optional. >>>>>>>>>> It's of course just an idea. Happy do have a other API design. The >>>>>>> goal >>>>>>>>>> was, to avoid to many new overloads. >>>>>>>>>> >>>>>>>>>>>> Could you clarify exactly what you mean by keeping the current >>>>>>>>> distinction? >>>>>>>>>> Current distinction is: user topics are created manually and user >>>>>>>>>> specifies the name -- internal topics are created by Kafka Streams >>>>>>> and >>>>>>>>>> an name is generated automatically. >>>>>>>>>> >>>>>>>>>> -> through("user-topic") >>>>>>>>>> -> through(TopicConfig.withNumberOfPartitions(5)) // Streams creates >>>>>>>> an >>>>>>>>>> internal topic >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -Matthias >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 11/6/17 6:56 PM, Thomas Becker wrote: >>>>>>>>>>> Could you clarify exactly what you mean by keeping the current >>>>>>>>> distinction? >>>>>>>>>>> Actually, re-reading the KIP and JIRA, it's not clear that being >>>>>>> able >>>>>>>>> to specify a custom name is actually a requirement. If the goal is to >>>>>>>>> control repartitioning and tune parallelism, maybe we can just >>>>>>>>> sidestep >>>>>>>>> this issue altogether by removing the ability to set a different >>>> name. >>>>>>>>>>> On Mon, 2017-11-06 at 16:51 +0100, Matthias J. Sax wrote: >>>>>>>>>>> >>>>>>>>>>> That's a good point. In current design, we strictly distinguish >>>>>>> both. >>>>>>>>>>> For example, the reset tools deletes internal topics (starting with >>>>>>>>>>> prefix `<application.id>-` and ending with either `-repartition` >>>> or >>>>>>>>>>> `-changelog`. >>>>>>>>>>> >>>>>>>>>>> Thus, from my point of view, it would make sense to keep the >>>> current >>>>>>>>>>> distinction. >>>>>>>>>>> >>>>>>>>>>> -Matthias >>>>>>>>>>> >>>>>>>>>>> On 11/6/17 4:45 PM, Thomas Becker wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> I think this sounds good as well. It's worth clarifying whether >>>>>>> topics >>>>>>>>> that are named by the user but created by streams are considered >>>>>>>> "internal" >>>>>>>>> topics also. >>>>>>>>>>> On Sun, 2017-11-05 at 23:02 +0100, Matthias J. Sax wrote: >>>>>>>>>>> >>>>>>>>>>> My idea was, to relax the requirement for through() that a topic >>>>>>> must >>>>>>>> be >>>>>>>>>>> created manually before startup. >>>>>>>>>>> >>>>>>>>>>> Thus, if no through() call is made, a (internal) topic is created >>>>>>> the >>>>>>>>>>> same way we do it currently. >>>>>>>>>>> >>>>>>>>>>> If one uses `through(String topicName)` we keep the current >>>> behavior >>>>>>>> and >>>>>>>>>>> require users to create the topic manually. >>>>>>>>>>> >>>>>>>>>>> The reasoning is as follows: if a user creates a topic manually, a >>>>>>>> user >>>>>>>>>>> can just use it for repartitioning. As the topic is already there, >>>>>>>> there >>>>>>>>>>> is no need to specify any topic configs. >>>>>>>>>>> >>>>>>>>>>> We add a new `through()` overload (details TBD) that allows to >>>>>>> specify >>>>>>>>>>> topic configs and Streams create the topic with those configs. >>>>>>>>>>> >>>>>>>>>>> Reasoning: user don't want to manage topic manually, thus, it's >>>>>>> still >>>>>>>> an >>>>>>>>>>> internal topic and Streams create the topic name automatically as >>>>>>> for >>>>>>>>>>> all other internal topics. However, users gets some more control >>>>>>> about >>>>>>>>>>> topic parameters like number of partitions (we should discuss what >>>>>>>> other >>>>>>>>>>> configs would be useful). >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Does this make sense? >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -Matthias >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 11/5/17 1:21 AM, Jan Filipiak wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Hi. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Im not 100 % up to date what version 1.0 DSL looks like ATM. >>>>>>>>>>> I just would argue that repartitioning should be an own API call >>>>>>> like >>>>>>>>>>> through or something. >>>>>>>>>>> One can use through or to already to get this. I would argue one >>>>>>>> should >>>>>>>>>>> look there instead of overloads >>>>>>>>>>> >>>>>>>>>>> Best Jan >>>>>>>>>>> >>>>>>>>>>> On 04.11.2017 16:01, Jeyhun Karimov wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Dear community, >>>>>>>>>>> >>>>>>>>>>> I would like to initiate discussion on KIP-221 [1] based on issue >>>>>>> [2]. >>>>>>>>>>> Please feel free to comment. >>>>>>>>>>> >>>>>>>>>>> [1] >>>>>>>>>>> >>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP- >>>>>>>> 221%3A+Repartition+Topic+Hints+in+Streams >>>>>>>>>>> [2] https://issues.apache.org/jira/browse/KAFKA-6037 >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Cheers, >>>>>>>>>>> Jeyhun >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> ________________________________ >>>>>>>>>>> >>>>>>>>>>> This email and any attachments may contain confidential and >>>>>>> privileged >>>>>>>>> material for the sole use of the intended recipient. Any review, >>>>>>> copying, >>>>>>>>> or distribution of this email (or any attachments) by others is >>>>>>>> prohibited. >>>>>>>>> If you are not the intended recipient, please contact the sender >>>>>>>>> immediately and permanently delete this email and any attachments. No >>>>>>>>> employee or agent of TiVo Inc. is authorized to conclude any binding >>>>>>>>> agreement on behalf of TiVo Inc. by email. Binding agreements with >>>>>>>>> TiVo >>>>>>>>> Inc. may only be made by a signed written agreement. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> ________________________________ >>>>>>>>>>> >>>>>>>>>>> This email and any attachments may contain confidential and >>>>>>> privileged >>>>>>>>> material for the sole use of the intended recipient. Any review, >>>>>>> copying, >>>>>>>>> or distribution of this email (or any attachments) by others is >>>>>>>> prohibited. >>>>>>>>> If you are not the intended recipient, please contact the sender >>>>>>>>> immediately and permanently delete this email and any attachments. No >>>>>>>>> employee or agent of TiVo Inc. is authorized to conclude any binding >>>>>>>>> agreement on behalf of TiVo Inc. by email. Binding agreements with >>>>>>>>> TiVo >>>>>>>>> Inc. may only be made by a signed written agreement. >>>>>>>>> >>>>>> >>>>>> >>>>> >>>> >>>> >>> >> >
signature.asc
Description: OpenPGP digital signature