Hi Matthias, all, Currently, I am not able to complete this KIP. Please accept my apologies for that.
Cheers, Jeyhun On Mon, Jun 11, 2018 at 2:25 AM Matthias J. Sax <matth...@confluent.io> wrote: > What is the status of this KIP? > > -Matthias > > > On 2/13/18 1:43 PM, Matthias J. Sax wrote: > > Is there any update for this KIP? > > > > > > -Matthias > > > > On 12/4/17 2:08 PM, Matthias J. Sax wrote: > >> Jeyhun, > >> > >> thanks for updating the KIP. > >> > >> I am wondering if you intend to add a new class `Produced`? There is > >> already `org.apache.kafka.streams.kstream.Produced`. So if we want to > >> add a new class, it must have a different name -- or we might be able to > >> merge both into one? > >> > >> Also, for the KStream overlaods of `through()` and `to()`, can you add > >> the different behavior using different overloads? It's not clear from > >> the KIP what the semantics are. > >> > >> > >> -Matthias > >> > >> On 11/17/17 3:27 PM, Jeyhun Karimov wrote: > >>> Hi, > >>> > >>> Thanks for your comments. I agree with Matthias partially. > >>> I think we should relax some requirements related with to() and > through() > >>> methods. > >>> IMHO, Produced class can cover (existing/to be created) topic > information, > >>> and which will ease our effort: > >>> > >>> KStream.to(Produced topicInfo) > >>> KStream.through(Produced topicInfo) > >>> > >>> This will decrease the number of overloads but we will need to > deprecate > >>> the existing to() and through() methods, perhaps. > >>> I updated the KIP accordingly. > >>> > >>> > >>> Cheers, > >>> Jeyhun > >>> > >>> On Thu, Nov 16, 2017 at 10:21 PM Matthias J. Sax < > matth...@confluent.io> > >>> wrote: > >>> > >>>> @Jan: > >>>> > >>>> The `Produced` class was introduced in 1.0 to specify key and valud > >>>> Serdes (and partitioner) if data is written into a topic. > >>>> > >>>> Old API: > >>>> > >>>> KStream#to("topic", keySerde, valueSerde); > >>>> > >>>> New API: > >>>> > >>>> KStream#to("topic", Produced.with(keySerde, valueSerde)); > >>>> > >>>> > >>>> This allows to reduce the number of overloads for `to()` (and > >>>> `through()` that follows the same pattern) -- the second parameter is > >>>> used to cover all different variations of option parameters users can > >>>> specify, while we only have 2 overload for `to()` itself. > >>>> > >>>> What is still unclear to me it, what you mean by this topic prefix > >>>> thing? Either a user cares about the topic name and thus, must create > >>>> and manage it manually. Or the user does not care, and Streams create > >>>> it. How would this prefix idea fit in here? > >>>> > >>>> > >>>> > >>>> @Guozhang: > >>>> > >>>> My idea was to extend `Produced` with the hint we want to give for > >>>> creating internal topic and pass a optional `Produced` parameter. > There > >>>> are multiple things we can do here: > >>>> > >>>> 1) stream.through(null, Produced...).groupBy().aggregate() > >>>> -> just allow for `null` topic name indicating that Streams should > >>>> create an internal topic > >>>> > >>>> 2) stream.through(Produced...).groupBy().aggregate() > >>>> -> add one overload taking an mandatory `Produced` > >>>> > >>>> We use `Serialized` to picky back the information > >>>> > >>>> 3) stream.groupBy(Serialized...).aggregate() > >>>> and stream.groupByKey(Serialized...).aggregate() > >>>> -> we don't need new top level overloads > >>>> > >>>> > >>>> There are different trade-offs for those alternatives and maybe there > >>>> are other ways to change the API. It's just to push the discussion > further. > >>>> > >>>> > >>>> -Matthias > >>>> > >>>> On 11/12/17 1:22 PM, Jan Filipiak wrote: > >>>>> Hi Gouzhang, > >>>>> > >>>>> this felt like these questions are supposed to be answered by me. > >>>>> I do not understand the first one. I don't understand why the user > >>>>> shouldn't be able to specify a suffix for the topic name. > >>>>> > >>>>> For the third question I am not 100% familiar if the Produced class > >>>>> came to existence > >>>>> at all. I remember proposing it somewhere in our redo DSL discussion > that > >>>>> I dropped out of later. Finally any call that does: > >>>>> > >>>>> 1. create the internal topic > >>>>> 2. register sink > >>>>> 3. register source > >>>>> > >>>>> will always get the work done. If we have a Produced like class. > putting > >>>>> all the parameters > >>>>> in there make sense. (Partitioner, serde, PartitionHint, internal, > name > >>>>> ... ) > >>>>> > >>>>> Hope this helps? > >>>>> > >>>>> > >>>>> On 10.11.2017 07:54, Guozhang Wang wrote: > >>>>>> A few clarification questions on the proposal details. > >>>>>> > >>>>>> 1. API: although the repartition only happens at the final stateful > >>>>>> operations like agg / join, the repartition flag info was actually > >>>> passed > >>>>>> from an earlier operator like map / groupBy. So what should be the > new > >>>>>> API > >>>>>> look like? For example, if we do > >>>>>> > >>>>>> stream.groupBy().through("topic-name", Produced..).aggregate > >>>>>> > >>>>>> This would be add a bunch of APIs to GroupedKStream/KTable > >>>>>> > >>>>>> 2. Semantics: as Matthias mentioned, today any topics defined in > >>>>>> "through()" call is considered a user topic, and hence users are > >>>>>> responsible for managing them, including the topic name. For this > KIP's > >>>>>> purpose, though, users would not care about the topic name. I.e. as > a > >>>>>> user > >>>>>> I still want to make it be an internal topic so that I do not need > to > >>>>>> worry > >>>>>> about it at all, but only specify num.partitions. > >>>>>> > >>>>>> 3. Details: in Produced we do not have specs for specifying the > >>>>>> num.partitions or should we repartition or not. So it is still not > >>>>>> clear to > >>>>>> me how we would make use of that to achieve what's in the old > >>>>>> proposal's RepartitionHint class. > >>>>>> > >>>>>> > >>>>>> > >>>>>> Guozhang > >>>>>> > >>>>>> > >>>>>> On Mon, Nov 6, 2017 at 1:21 PM, Ted Yu <yuzhih...@gmail.com> wrote: > >>>>>> > >>>>>>> bq. enlarge the score of through() > >>>>>>> > >>>>>>> I guess you meant scope. > >>>>>>> > >>>>>>> On Mon, Nov 6, 2017 at 1:15 PM, Jeyhun Karimov < > je.kari...@gmail.com> > >>>>>>> wrote: > >>>>>>> > >>>>>>>> Hi, > >>>>>>>> > >>>>>>>> Sorry for the late reply. I am convinced that we should enlarge > the > >>>>>>>> score > >>>>>>>> of through() (add more overloads) instead of introducing a > separate > >>>> set > >>>>>>> of > >>>>>>>> overloads to other methods. > >>>>>>>> I will update the KIP soon based on the discussion and inform. > >>>>>>>> > >>>>>>>> > >>>>>>>> Cheers, > >>>>>>>> Jeyhun > >>>>>>>> > >>>>>>>> On Mon, Nov 6, 2017 at 9:18 PM Jan Filipiak < > jan.filip...@trivago.com > >>>>> > >>>>>>>> wrote: > >>>>>>>> > >>>>>>>>> Sorry for not beeing 100% up to date. > >>>>>>>>> Back then we had the discussion that when an operation puts a > >Sink< > >>>>>>>>> into the topology, a >Produced< > >>>>>>>>> parameter is added. This produced parameter could have internal > or > >>>>>>>>> external. If internal I think the name would still make > >>>>>>>>> a great suffix for the topic name > >>>>>>>>> > >>>>>>>>> Is this plan still around? Otherwise having the name as suffix is > >>>>>>>>> probably always good it can help the user quicker to identify hot > >>>>>>> topics > >>>>>>>>> that need more > >>>>>>>>> partitions if he has many of these internal repartitions > >>>>>>>>> > >>>>>>>>> Best Jan > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> On 06.11.2017 20:13, Matthias J. Sax wrote: > >>>>>>>>>> I absolute agree with what you say. It's not a requirement to > >>>>>>> specify a > >>>>>>>>>> topic name -- and this was the idea -- if user does specify a > name, > >>>>>>> we > >>>>>>>>>> treat as is -- if users does not specify a name, Streams create > an > >>>>>>>>>> internal topic. > >>>>>>>>>> > >>>>>>>>>> The goal of the Jira is to allow a simplified way to control > >>>>>>>>>> repartitioning (atm, user needs to manually create a topic and > use > >>>>>>> via > >>>>>>>>>> through()). > >>>>>>>>>> > >>>>>>>>>> Thus, the idea is to make the topic name parameter of through > >>>>>>> optional. > >>>>>>>>>> It's of course just an idea. Happy do have a other API design. > The > >>>>>>> goal > >>>>>>>>>> was, to avoid to many new overloads. > >>>>>>>>>> > >>>>>>>>>>>> Could you clarify exactly what you mean by keeping the current > >>>>>>>>> distinction? > >>>>>>>>>> Current distinction is: user topics are created manually and > user > >>>>>>>>>> specifies the name -- internal topics are created by Kafka > Streams > >>>>>>> and > >>>>>>>>>> an name is generated automatically. > >>>>>>>>>> > >>>>>>>>>> -> through("user-topic") > >>>>>>>>>> -> through(TopicConfig.withNumberOfPartitions(5)) // Streams > creates > >>>>>>>> an > >>>>>>>>>> internal topic > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> -Matthias > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> On 11/6/17 6:56 PM, Thomas Becker wrote: > >>>>>>>>>>> Could you clarify exactly what you mean by keeping the current > >>>>>>>>> distinction? > >>>>>>>>>>> Actually, re-reading the KIP and JIRA, it's not clear that > being > >>>>>>> able > >>>>>>>>> to specify a custom name is actually a requirement. If the goal > is to > >>>>>>>>> control repartitioning and tune parallelism, maybe we can just > >>>>>>>>> sidestep > >>>>>>>>> this issue altogether by removing the ability to set a different > >>>> name. > >>>>>>>>>>> On Mon, 2017-11-06 at 16:51 +0100, Matthias J. Sax wrote: > >>>>>>>>>>> > >>>>>>>>>>> That's a good point. In current design, we strictly distinguish > >>>>>>> both. > >>>>>>>>>>> For example, the reset tools deletes internal topics (starting > with > >>>>>>>>>>> prefix `<application.id>-` and ending with either > `-repartition` > >>>> or > >>>>>>>>>>> `-changelog`. > >>>>>>>>>>> > >>>>>>>>>>> Thus, from my point of view, it would make sense to keep the > >>>> current > >>>>>>>>>>> distinction. > >>>>>>>>>>> > >>>>>>>>>>> -Matthias > >>>>>>>>>>> > >>>>>>>>>>> On 11/6/17 4:45 PM, Thomas Becker wrote: > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> I think this sounds good as well. It's worth clarifying whether > >>>>>>> topics > >>>>>>>>> that are named by the user but created by streams are considered > >>>>>>>> "internal" > >>>>>>>>> topics also. > >>>>>>>>>>> On Sun, 2017-11-05 at 23:02 +0100, Matthias J. Sax wrote: > >>>>>>>>>>> > >>>>>>>>>>> My idea was, to relax the requirement for through() that a > topic > >>>>>>> must > >>>>>>>> be > >>>>>>>>>>> created manually before startup. > >>>>>>>>>>> > >>>>>>>>>>> Thus, if no through() call is made, a (internal) topic is > created > >>>>>>> the > >>>>>>>>>>> same way we do it currently. > >>>>>>>>>>> > >>>>>>>>>>> If one uses `through(String topicName)` we keep the current > >>>> behavior > >>>>>>>> and > >>>>>>>>>>> require users to create the topic manually. > >>>>>>>>>>> > >>>>>>>>>>> The reasoning is as follows: if a user creates a topic > manually, a > >>>>>>>> user > >>>>>>>>>>> can just use it for repartitioning. As the topic is already > there, > >>>>>>>> there > >>>>>>>>>>> is no need to specify any topic configs. > >>>>>>>>>>> > >>>>>>>>>>> We add a new `through()` overload (details TBD) that allows to > >>>>>>> specify > >>>>>>>>>>> topic configs and Streams create the topic with those configs. > >>>>>>>>>>> > >>>>>>>>>>> Reasoning: user don't want to manage topic manually, thus, it's > >>>>>>> still > >>>>>>>> an > >>>>>>>>>>> internal topic and Streams create the topic name automatically > as > >>>>>>> for > >>>>>>>>>>> all other internal topics. However, users gets some more > control > >>>>>>> about > >>>>>>>>>>> topic parameters like number of partitions (we should discuss > what > >>>>>>>> other > >>>>>>>>>>> configs would be useful). > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> Does this make sense? > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> -Matthias > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> On 11/5/17 1:21 AM, Jan Filipiak wrote: > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> Hi. > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> Im not 100 % up to date what version 1.0 DSL looks like ATM. > >>>>>>>>>>> I just would argue that repartitioning should be an own API > call > >>>>>>> like > >>>>>>>>>>> through or something. > >>>>>>>>>>> One can use through or to already to get this. I would argue > one > >>>>>>>> should > >>>>>>>>>>> look there instead of overloads > >>>>>>>>>>> > >>>>>>>>>>> Best Jan > >>>>>>>>>>> > >>>>>>>>>>> On 04.11.2017 16:01, Jeyhun Karimov wrote: > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> Dear community, > >>>>>>>>>>> > >>>>>>>>>>> I would like to initiate discussion on KIP-221 [1] based on > issue > >>>>>>> [2]. > >>>>>>>>>>> Please feel free to comment. > >>>>>>>>>>> > >>>>>>>>>>> [1] > >>>>>>>>>>> > >>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP- > >>>>>>>> 221%3A+Repartition+Topic+Hints+in+Streams > >>>>>>>>>>> [2] https://issues.apache.org/jira/browse/KAFKA-6037 > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> Cheers, > >>>>>>>>>>> Jeyhun > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> ________________________________ > >>>>>>>>>>> > >>>>>>>>>>> This email and any attachments may contain confidential and > >>>>>>> privileged > >>>>>>>>> material for the sole use of the intended recipient. Any review, > >>>>>>> copying, > >>>>>>>>> or distribution of this email (or any attachments) by others is > >>>>>>>> prohibited. > >>>>>>>>> If you are not the intended recipient, please contact the sender > >>>>>>>>> immediately and permanently delete this email and any > attachments. No > >>>>>>>>> employee or agent of TiVo Inc. is authorized to conclude any > binding > >>>>>>>>> agreement on behalf of TiVo Inc. by email. Binding agreements > with > >>>>>>>>> TiVo > >>>>>>>>> Inc. may only be made by a signed written agreement. > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> ________________________________ > >>>>>>>>>>> > >>>>>>>>>>> This email and any attachments may contain confidential and > >>>>>>> privileged > >>>>>>>>> material for the sole use of the intended recipient. Any review, > >>>>>>> copying, > >>>>>>>>> or distribution of this email (or any attachments) by others is > >>>>>>>> prohibited. > >>>>>>>>> If you are not the intended recipient, please contact the sender > >>>>>>>>> immediately and permanently delete this email and any > attachments. No > >>>>>>>>> employee or agent of TiVo Inc. is authorized to conclude any > binding > >>>>>>>>> agreement on behalf of TiVo Inc. by email. Binding agreements > with > >>>>>>>>> TiVo > >>>>>>>>> Inc. may only be made by a signed written agreement. > >>>>>>>>> > >>>>>> > >>>>>> > >>>>> > >>>> > >>>> > >>> > >> > > > >