Hi, Thanks for your comments. I agree with Matthias partially. I think we should relax some requirements related with to() and through() methods. IMHO, Produced class can cover (existing/to be created) topic information, and which will ease our effort:
KStream.to(Produced topicInfo) KStream.through(Produced topicInfo) This will decrease the number of overloads but we will need to deprecate the existing to() and through() methods, perhaps. I updated the KIP accordingly. Cheers, Jeyhun On Thu, Nov 16, 2017 at 10:21 PM Matthias J. Sax <matth...@confluent.io> wrote: > @Jan: > > The `Produced` class was introduced in 1.0 to specify key and valud > Serdes (and partitioner) if data is written into a topic. > > Old API: > > KStream#to("topic", keySerde, valueSerde); > > New API: > > KStream#to("topic", Produced.with(keySerde, valueSerde)); > > > This allows to reduce the number of overloads for `to()` (and > `through()` that follows the same pattern) -- the second parameter is > used to cover all different variations of option parameters users can > specify, while we only have 2 overload for `to()` itself. > > What is still unclear to me it, what you mean by this topic prefix > thing? Either a user cares about the topic name and thus, must create > and manage it manually. Or the user does not care, and Streams create > it. How would this prefix idea fit in here? > > > > @Guozhang: > > My idea was to extend `Produced` with the hint we want to give for > creating internal topic and pass a optional `Produced` parameter. There > are multiple things we can do here: > > 1) stream.through(null, Produced...).groupBy().aggregate() > -> just allow for `null` topic name indicating that Streams should > create an internal topic > > 2) stream.through(Produced...).groupBy().aggregate() > -> add one overload taking an mandatory `Produced` > > We use `Serialized` to picky back the information > > 3) stream.groupBy(Serialized...).aggregate() > and stream.groupByKey(Serialized...).aggregate() > -> we don't need new top level overloads > > > There are different trade-offs for those alternatives and maybe there > are other ways to change the API. It's just to push the discussion further. > > > -Matthias > > On 11/12/17 1:22 PM, Jan Filipiak wrote: > > Hi Gouzhang, > > > > this felt like these questions are supposed to be answered by me. > > I do not understand the first one. I don't understand why the user > > shouldn't be able to specify a suffix for the topic name. > > > > For the third question I am not 100% familiar if the Produced class > > came to existence > > at all. I remember proposing it somewhere in our redo DSL discussion that > > I dropped out of later. Finally any call that does: > > > > 1. create the internal topic > > 2. register sink > > 3. register source > > > > will always get the work done. If we have a Produced like class. putting > > all the parameters > > in there make sense. (Partitioner, serde, PartitionHint, internal, name > > ... ) > > > > Hope this helps? > > > > > > On 10.11.2017 07:54, Guozhang Wang wrote: > >> A few clarification questions on the proposal details. > >> > >> 1. API: although the repartition only happens at the final stateful > >> operations like agg / join, the repartition flag info was actually > passed > >> from an earlier operator like map / groupBy. So what should be the new > >> API > >> look like? For example, if we do > >> > >> stream.groupBy().through("topic-name", Produced..).aggregate > >> > >> This would be add a bunch of APIs to GroupedKStream/KTable > >> > >> 2. Semantics: as Matthias mentioned, today any topics defined in > >> "through()" call is considered a user topic, and hence users are > >> responsible for managing them, including the topic name. For this KIP's > >> purpose, though, users would not care about the topic name. I.e. as a > >> user > >> I still want to make it be an internal topic so that I do not need to > >> worry > >> about it at all, but only specify num.partitions. > >> > >> 3. Details: in Produced we do not have specs for specifying the > >> num.partitions or should we repartition or not. So it is still not > >> clear to > >> me how we would make use of that to achieve what's in the old > >> proposal's RepartitionHint class. > >> > >> > >> > >> Guozhang > >> > >> > >> On Mon, Nov 6, 2017 at 1:21 PM, Ted Yu <yuzhih...@gmail.com> wrote: > >> > >>> bq. enlarge the score of through() > >>> > >>> I guess you meant scope. > >>> > >>> On Mon, Nov 6, 2017 at 1:15 PM, Jeyhun Karimov <je.kari...@gmail.com> > >>> wrote: > >>> > >>>> Hi, > >>>> > >>>> Sorry for the late reply. I am convinced that we should enlarge the > >>>> score > >>>> of through() (add more overloads) instead of introducing a separate > set > >>> of > >>>> overloads to other methods. > >>>> I will update the KIP soon based on the discussion and inform. > >>>> > >>>> > >>>> Cheers, > >>>> Jeyhun > >>>> > >>>> On Mon, Nov 6, 2017 at 9:18 PM Jan Filipiak <jan.filip...@trivago.com > > > >>>> wrote: > >>>> > >>>>> Sorry for not beeing 100% up to date. > >>>>> Back then we had the discussion that when an operation puts a >Sink< > >>>>> into the topology, a >Produced< > >>>>> parameter is added. This produced parameter could have internal or > >>>>> external. If internal I think the name would still make > >>>>> a great suffix for the topic name > >>>>> > >>>>> Is this plan still around? Otherwise having the name as suffix is > >>>>> probably always good it can help the user quicker to identify hot > >>> topics > >>>>> that need more > >>>>> partitions if he has many of these internal repartitions > >>>>> > >>>>> Best Jan > >>>>> > >>>>> > >>>>> On 06.11.2017 20:13, Matthias J. Sax wrote: > >>>>>> I absolute agree with what you say. It's not a requirement to > >>> specify a > >>>>>> topic name -- and this was the idea -- if user does specify a name, > >>> we > >>>>>> treat as is -- if users does not specify a name, Streams create an > >>>>>> internal topic. > >>>>>> > >>>>>> The goal of the Jira is to allow a simplified way to control > >>>>>> repartitioning (atm, user needs to manually create a topic and use > >>> via > >>>>>> through()). > >>>>>> > >>>>>> Thus, the idea is to make the topic name parameter of through > >>> optional. > >>>>>> It's of course just an idea. Happy do have a other API design. The > >>> goal > >>>>>> was, to avoid to many new overloads. > >>>>>> > >>>>>>>> Could you clarify exactly what you mean by keeping the current > >>>>> distinction? > >>>>>> Current distinction is: user topics are created manually and user > >>>>>> specifies the name -- internal topics are created by Kafka Streams > >>> and > >>>>>> an name is generated automatically. > >>>>>> > >>>>>> -> through("user-topic") > >>>>>> -> through(TopicConfig.withNumberOfPartitions(5)) // Streams creates > >>>> an > >>>>>> internal topic > >>>>>> > >>>>>> > >>>>>> -Matthias > >>>>>> > >>>>>> > >>>>>> On 11/6/17 6:56 PM, Thomas Becker wrote: > >>>>>>> Could you clarify exactly what you mean by keeping the current > >>>>> distinction? > >>>>>>> Actually, re-reading the KIP and JIRA, it's not clear that being > >>> able > >>>>> to specify a custom name is actually a requirement. If the goal is to > >>>>> control repartitioning and tune parallelism, maybe we can just > >>>>> sidestep > >>>>> this issue altogether by removing the ability to set a different > name. > >>>>>>> On Mon, 2017-11-06 at 16:51 +0100, Matthias J. Sax wrote: > >>>>>>> > >>>>>>> That's a good point. In current design, we strictly distinguish > >>> both. > >>>>>>> For example, the reset tools deletes internal topics (starting with > >>>>>>> prefix `<application.id>-` and ending with either `-repartition` > or > >>>>>>> `-changelog`. > >>>>>>> > >>>>>>> Thus, from my point of view, it would make sense to keep the > current > >>>>>>> distinction. > >>>>>>> > >>>>>>> -Matthias > >>>>>>> > >>>>>>> On 11/6/17 4:45 PM, Thomas Becker wrote: > >>>>>>> > >>>>>>> > >>>>>>> I think this sounds good as well. It's worth clarifying whether > >>> topics > >>>>> that are named by the user but created by streams are considered > >>>> "internal" > >>>>> topics also. > >>>>>>> On Sun, 2017-11-05 at 23:02 +0100, Matthias J. Sax wrote: > >>>>>>> > >>>>>>> My idea was, to relax the requirement for through() that a topic > >>> must > >>>> be > >>>>>>> created manually before startup. > >>>>>>> > >>>>>>> Thus, if no through() call is made, a (internal) topic is created > >>> the > >>>>>>> same way we do it currently. > >>>>>>> > >>>>>>> If one uses `through(String topicName)` we keep the current > behavior > >>>> and > >>>>>>> require users to create the topic manually. > >>>>>>> > >>>>>>> The reasoning is as follows: if a user creates a topic manually, a > >>>> user > >>>>>>> can just use it for repartitioning. As the topic is already there, > >>>> there > >>>>>>> is no need to specify any topic configs. > >>>>>>> > >>>>>>> We add a new `through()` overload (details TBD) that allows to > >>> specify > >>>>>>> topic configs and Streams create the topic with those configs. > >>>>>>> > >>>>>>> Reasoning: user don't want to manage topic manually, thus, it's > >>> still > >>>> an > >>>>>>> internal topic and Streams create the topic name automatically as > >>> for > >>>>>>> all other internal topics. However, users gets some more control > >>> about > >>>>>>> topic parameters like number of partitions (we should discuss what > >>>> other > >>>>>>> configs would be useful). > >>>>>>> > >>>>>>> > >>>>>>> Does this make sense? > >>>>>>> > >>>>>>> > >>>>>>> -Matthias > >>>>>>> > >>>>>>> > >>>>>>> On 11/5/17 1:21 AM, Jan Filipiak wrote: > >>>>>>> > >>>>>>> > >>>>>>> Hi. > >>>>>>> > >>>>>>> > >>>>>>> Im not 100 % up to date what version 1.0 DSL looks like ATM. > >>>>>>> I just would argue that repartitioning should be an own API call > >>> like > >>>>>>> through or something. > >>>>>>> One can use through or to already to get this. I would argue one > >>>> should > >>>>>>> look there instead of overloads > >>>>>>> > >>>>>>> Best Jan > >>>>>>> > >>>>>>> On 04.11.2017 16:01, Jeyhun Karimov wrote: > >>>>>>> > >>>>>>> > >>>>>>> Dear community, > >>>>>>> > >>>>>>> I would like to initiate discussion on KIP-221 [1] based on issue > >>> [2]. > >>>>>>> Please feel free to comment. > >>>>>>> > >>>>>>> [1] > >>>>>>> > >>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP- > >>>> 221%3A+Repartition+Topic+Hints+in+Streams > >>>>>>> [2] https://issues.apache.org/jira/browse/KAFKA-6037 > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> Cheers, > >>>>>>> Jeyhun > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> ________________________________ > >>>>>>> > >>>>>>> This email and any attachments may contain confidential and > >>> privileged > >>>>> material for the sole use of the intended recipient. Any review, > >>> copying, > >>>>> or distribution of this email (or any attachments) by others is > >>>> prohibited. > >>>>> If you are not the intended recipient, please contact the sender > >>>>> immediately and permanently delete this email and any attachments. No > >>>>> employee or agent of TiVo Inc. is authorized to conclude any binding > >>>>> agreement on behalf of TiVo Inc. by email. Binding agreements with > >>>>> TiVo > >>>>> Inc. may only be made by a signed written agreement. > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> ________________________________ > >>>>>>> > >>>>>>> This email and any attachments may contain confidential and > >>> privileged > >>>>> material for the sole use of the intended recipient. Any review, > >>> copying, > >>>>> or distribution of this email (or any attachments) by others is > >>>> prohibited. > >>>>> If you are not the intended recipient, please contact the sender > >>>>> immediately and permanently delete this email and any attachments. No > >>>>> employee or agent of TiVo Inc. is authorized to conclude any binding > >>>>> agreement on behalf of TiVo Inc. by email. Binding agreements with > >>>>> TiVo > >>>>> Inc. may only be made by a signed written agreement. > >>>>> > >> > >> > > > >