Hi, I would like to trigger this discussion again. It seems that the naming question is rather subjective and both main alternatives (w/ or w/o the word "Topology" in the name) have pros/cons.
If you have any further thought, please share it. At the moment I still propose `StreamsBuilder` in the KIP. I also want do point out, that the VOTE thread was already started. So if you like the current KIP, please cast your vote there. Thanks a lot! -Matthias On 3/23/17 3:38 PM, Matthias J. Sax wrote: > Jay, > > about the naming schema: > >>> 1. "kstreams" - the DSL >>> 2. "processor api" - the lower level callback/topology api >>> 3. KStream/KTable - entities in the kstreams dsl >>> 4. "Kafka Streams" - General name for stream processing stuff in Kafka, >>> including both kstreams and the processor API plus the underlying >>> implementation. > > It think this terminology has some issues... To me, `kstreams` was > always not more than an abbreviation for `Kafka Streams` -- thus (1) and > (4) kinda collide here. Following questions on the mailing list etc I > often see people using kstreams or kstream exactly a abbr. for "Kafka > Streams" > >> I think referring to the dsl as "kstreams" is cute and pneumonic and not >> particularly confusing. > > I disagree here. It's a very subtle difference between `kstreams` and > `KStream` -- just singular/plural, thus (1) and (3) also "collide" -- > it's just too close to each other. > > Thus, I really think it's a good idea to get a new name for the DSL to > get a better separation of the 4 concepts. > > Furthermore, we use the term "Streams API". Thus, I think > `StreamsBuilder` (or `StreamsTopologyBuilder`) are both very good names. > > > Thus, I prefer to keep the KIP as is (suggesting `StreamsBuilder`). > > I will start a VOTE thread. Of course, we can still discuss the naming > issue. :) > > > > -Matthias > > > On 3/22/17 8:53 PM, Jay Kreps wrote: >> I don't feel strongly on this, so I'm happy with whatever everyone else >> wants. >> >> Michael, I'm not arguing that people don't need to understand topologies, I >> just think it is like rocks db, you need to understand it when >> debugging/operating but not in the initial coding since the metaphor we're >> providing at this layer isn't a topology of processors but rather something >> like the collections api. Anyhow it won't hurt people to have it there. >> >> For the original KStreamBuilder thing, I think that came from the naming we >> discussed originally: >> >> 1. "kstreams" - the DSL >> 2. "processor api" - the lower level callback/topology api >> 3. KStream/KTable - entities in the kstreams dsl >> 4. "Kafka Streams" - General name for stream processing stuff in Kafka, >> including both kstreams and the processor API plus the underlying >> implementation. >> >> I think referring to the dsl as "kstreams" is cute and pneumonic and not >> particularly confusing. Just like referring to the "java collections >> library" isn't confusing even though it contains the Iterator interface >> which is not actually itself a collection. >> >> So I think KStreamBuilder should technically have been KstreamsBuilder and >> is intended not to be a builder of a KStream but rather the builder for the >> kstreams DSL. Okay, yes, that *is* slightly confusing. :-) >> >> -Jay >> >> On Wed, Mar 22, 2017 at 11:25 AM, Guozhang Wang <wangg...@gmail.com> wrote: >> >>> Regarding the naming of `StreamsTopologyBuilder` v.s. `StreamsBuilder` that >>> are going to be used in DSL, I agree both has their arguments: >>> >>> 1. On one side, people using the DSL layer probably do not need to be aware >>> (or rather, "learn about") of the "topology" concept, although this concept >>> is a publicly exposed one in Kafka Streams. >>> >>> 2. On the other side, StreamsBuilder#build() returning a Topology object >>> sounds a little weird, at least to me (admittedly subjective matter). >>> >>> >>> Since the second bullet point seems to be more "subjective" and many people >>> are not worried about it, I'm OK to go with the other option. >>> >>> >>> Guozhang >>> >>> >>> On Wed, Mar 22, 2017 at 8:58 AM, Michael Noll <mich...@confluent.io> >>> wrote: >>> >>>> Forwarding to kafka-user. >>>> >>>> >>>> ---------- Forwarded message ---------- >>>> From: Michael Noll <mich...@confluent.io> >>>> Date: Wed, Mar 22, 2017 at 8:48 AM >>>> Subject: Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API >>>> To: d...@kafka.apache.org >>>> >>>> >>>> Matthias, >>>> >>>>> @Michael: >>>>> >>>>> You seemed to agree with Jay about not exposing the `Topology` concept >>>>> in our main entry class (ie, current KStreamBuilder), thus, I >>>>> interpreted that you do not want `Topology` in the name either (I am a >>>>> little surprised by your last response, that goes the opposite >>>> direction). >>>> >>>> Oh, sorry for not being clear. >>>> >>>> What I wanted to say in my earlier email was the following: Yes, I do >>>> agree with most of Jay's reasoning, notably about carefully deciding how >>>> much and which parts of the API/concept "surface" we expose to users of >>> the >>>> DSL. However, and this is perhaps where I wasn't very clear, I disagree >>> on >>>> the particular opinion about not exposing the topology concept to DSL >>>> users. Instead, I think the concept of a topology is important to >>>> understand even for DSL users -- particularly because of the way the DSL >>> is >>>> currently wiring your processing logic via the builder pattern. (As I >>>> noted, e.g. Akka uses a different approach where you might be able to get >>>> away with not exposing the "topology" concept, but even in Akka there's >>> the >>>> notion of graphs and flows.) >>>> >>>> >>>>>> StreamsBuilder builder = new StreamsBuilder(); >>>>>> >>>>>> // And here you'd define your...well, what actually? >>>>>> // Ah right, you are composing a topology here, though you are >>> not >>>>>> aware of it. >>>>> >>>>> Yes. You are not aware of if -- that's the whole point about it -- >>> don't >>>>> put the Topology concept in the focus... >>>> >>>> Let me turn this around, because that was my point: it's confusing to >>> have >>>> a name "StreamsBuilder" if that thing isn't building streams, and it is >>>> not. >>>> >>>> As I mentioned before, I do think it is a benefit to make it clear to DSL >>>> users that there are two aspects at play: (1) defining the logic/plan of >>>> your processing, and (2) the execution of that plan. I have a less >>> strong >>>> opinion whether or not having "topology" in the names would help to >>>> communicate this separation as well as combination of (1) and (2) to make >>>> your app work as expected. >>>> >>>> If we stick with `KafkaStreams` for (2) *and* don't like having >>> "topology" >>>> in the name, then perhaps we should rename `KStreamBuilder` to >>>> `KafkaStreamsBuilder`. That at least gives some illusion of a combo of >>> (1) >>>> and (2). IMHO, `KafkaStreamsBuilder` highlights better that "it is a >>>> builder/helper for the Kafka Streams API", rather than "a builder for >>>> streams". >>>> >>>> Also, I think some of the naming challenges we're discussing here are >>>> caused by having this builder pattern in the first place. If the Streams >>>> API was implemented in Scala, for example, we could use implicits for >>>> helping us to "stitch streams/tables together to build the full >>> topology", >>>> thus using a different (better?) approach to composing your topologies >>> that >>>> through a builder pattern. So: perhaps there's a better way then the >>>> builder, and that way would also be clearer on terminology? That said, >>>> this might take this KIP off-scope. >>>> >>>> -Michael >>>> >>>> >>>> >>>> >>>> On Wed, Mar 22, 2017 at 12:33 AM, Matthias J. Sax <matth...@confluent.io >>>> >>>> wrote: >>>> >>>>> @Guozhang: >>>>> >>>>> I recognized that you want to have `Topology` in the name. But it seems >>>>> that more people preferred to not have it (Jay, Ram, Michael [?], >>>> myself). >>>>> >>>>> @Michael: >>>>> >>>>> You seemed to agree with Jay about not exposing the `Topology` concept >>>>> in our main entry class (ie, current KStreamBuilder), thus, I >>>>> interpreted that you do not want `Topology` in the name either (I am a >>>>> little surprised by your last response, that goes the opposite >>>> direction). >>>>> >>>>>> StreamsBuilder builder = new StreamsBuilder(); >>>>>> >>>>>> // And here you'd define your...well, what actually? >>>>>> // Ah right, you are composing a topology here, though you are >>> not >>>>>> aware of it. >>>>> >>>>> Yes. You are not aware of if -- that's the whole point about it -- >>> don't >>>>> put the Topology concept in the focus... >>>>> >>>>> Furthermore, >>>>> >>>>>>>> So what are you building here with StreamsBuilder? Streams (hint: >>>> No)? >>>>>>>> And what about tables -- is there a TableBuilder (hint: No)? >>>>> >>>>> I am not sure, if this is too much a concern. In contrast to >>>>> `KStreamBuilder` (singular) that contains `KStream` and thus puts >>>>> KStream concept in focus and thus degrade `KTable`, `StreamsBuilder` >>>>> (plural) focuses on "Streams API". IMHO, it does not put focus on >>>>> KStream. It's just a builder from the Streams API -- you don't need to >>>>> worry what you are building -- and you don't need to think about the >>>>> `Topology` concept (of course, you see that .build() return a >>> Topology). >>>>> >>>>> >>>>> Personally, I see pros and cons for both `StreamsBuilder` and >>>>> `StreamsTopologyBuilder` and thus, I am fine either way. Maybe Jay and >>>>> Ram can follow up and share their thoughts? >>>>> >>>>> I would also help a lot if other people put their vote for a name, too. >>>>> >>>>> >>>>> >>>>> -Matthias >>>>> >>>>> >>>>> >>>>> On 3/21/17 2:11 PM, Guozhang Wang wrote: >>>>>> Just to clarify, I did want to have the term `Topology` as part of >>> the >>>>>> class name, for the reasons above. I'm not too worried about to be >>>>>> consistent with the previous names, but I feel the >>> `XXTopologyBuilder` >>>> is >>>>>> better than `XXStreamsBuilder` since it's build() function returns a >>>>>> Topology object. >>>>>> >>>>>> >>>>>> Guozhang >>>>>> >>>>>> >>>>>> On Mon, Mar 20, 2017 at 12:53 PM, Michael Noll <mich...@confluent.io >>>> >>>>> wrote: >>>>>> >>>>>>> Hmm, I must admit I don't like this last update all too much. >>>>>>> >>>>>>> Basically we would have: >>>>>>> >>>>>>> StreamsBuilder builder = new StreamsBuilder(); >>>>>>> >>>>>>> // And here you'd define your...well, what actually? >>>>>>> // Ah right, you are composing a topology here, though you are >>> not >>>>>>> aware of it. >>>>>>> >>>>>>> KafkaStreams streams = new KafkaStreams(builder.build(), >>>>>>> streamsConfiguration); >>>>>>> >>>>>>> So what are you building here with StreamsBuilder? Streams (hint: >>>> No)? >>>>>>> And what about tables -- is there a TableBuilder (hint: No)? >>>>>>> >>>>>>> I also interpret Guozhang's last response as that he'd prefer to >>> have >>>>>>> "Topology" in the class/interface names. I am aware that we >>> shouldn't >>>>>>> necessarily use the status quo to make decisions about future >>> changes, >>>>> but >>>>>>> the very first concept we explain in the Kafka Streams documentation >>>> is >>>>>>> "Stream Processing Topology": >>>>>>> https://kafka.apache.org/0102/documentation/streams#streams_ >>> concepts >>>>>>> >>>>>>> -Michael >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Mon, Mar 20, 2017 at 7:55 PM, Matthias J. Sax < >>>> matth...@confluent.io >>>>>> >>>>>>> wrote: >>>>>>> >>>>>>>> \cc users list >>>>>>>> >>>>>>>> >>>>>>>> -------- Forwarded Message -------- >>>>>>>> Subject: Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API >>>>>>>> Date: Mon, 20 Mar 2017 11:51:01 -0700 >>>>>>>> From: Matthias J. Sax <matth...@confluent.io> >>>>>>>> Organization: Confluent Inc >>>>>>>> To: d...@kafka.apache.org >>>>>>>> >>>>>>>> I want to push this discussion further. >>>>>>>> >>>>>>>> Guozhang's argument about "exposing" the Topology class is valid. >>>> It's >>>>> a >>>>>>>> public class anyway, so it's not as issue. However, I think the >>>>> question >>>>>>>> is not too much about exposing but about "advertising" (ie, putting >>>> it >>>>>>>> into the focus) or not at DSL level. >>>>>>>> >>>>>>>> >>>>>>>> If I interpret the last replies correctly, it seems that we could >>>> agree >>>>>>>> on "StreamsBuilder" as name. I did update the KIP accordingly. >>> Please >>>>>>>> correct me, if I got this wrong. >>>>>>>> >>>>>>>> >>>>>>>> If there are not other objects -- this naming discussion was the >>> last >>>>>>>> open point to far -- I would like the start the VOTE thread. >>>>>>>> >>>>>>>> >>>>>>>> -Matthias >>>>>>>> >>>>>>>> >>>>>>>> On 3/14/17 2:37 PM, Guozhang Wang wrote: >>>>>>>>> I'd like to keep the term "Topology" inside the builder class >>> since, >>>>> as >>>>>>>>> Matthias mentioned, this builder#build() function returns a >>>> "Topology" >>>>>>>>> object, whose type is a public class anyways. Although you can >>> argue >>>>> to >>>>>>>> let >>>>>>>>> users always call >>>>>>>>> >>>>>>>>> "new KafkaStreams(builder.build())" >>>>>>>>> >>>>>>>>> I think it is still more benefit to expose this concept. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Guozhang >>>>>>>>> >>>>>>>>> On Tue, Mar 14, 2017 at 10:43 AM, Matthias J. Sax < >>>>>>> matth...@confluent.io >>>>>>>>> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Thanks for your input Michael. >>>>>>>>>> >>>>>>>>>>>> - KafkaStreams as the new name for the builder that creates the >>>>>>>> logical >>>>>>>>>>>> plan, with e.g. `KafkaStreams.stream("intput-topic")` and >>>>>>>>>>>> `KafkaStreams.table("input-topic")`. >>>>>>>>>> >>>>>>>>>> I don't thinks this is a good idea, for multiple reasons: >>>>>>>>>> >>>>>>>>>> (1) We would reuse a name for a completely different purpose. The >>>>> same >>>>>>>>>> argument for not renaming KStreamBuilder to TopologyBuilder. The >>>>>>>>>> confusion would just be too large. >>>>>>>>>> >>>>>>>>>> So if we would start from scratch, it might be ok to do so, but >>> now >>>>> we >>>>>>>>>> cannot make this move, IMHO. >>>>>>>>>> >>>>>>>>>> Also a clarification question: do you suggest to have static >>>> methods >>>>>>>>>> #stream and #table -- I am not sure if this would work? >>>>>>>>>> (or was you code snippet just simplification?) >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> (2) Kafka Streams is basically a "processing client" next to >>>> consumer >>>>>>>>>> and producer client. Thus, the name KafkaStreams aligns to the >>>> naming >>>>>>>>>> schema of KafkaConsumer and KafkaProducer. I am not sure if it >>>> would >>>>>>> be >>>>>>>>>> a good choice to "break" this naming scheme. >>>>>>>>>> >>>>>>>>>> Btw: this is also the reason, why we have KafkaStreams#close() -- >>>> and >>>>>>>>>> not KafkaStreams#stop() -- because #close() aligns with consumer >>>> and >>>>>>>>>> producer client. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> (3) On more argument against using KafkaStreams as DSL entry >>> class >>>>>>> would >>>>>>>>>> be, that it would need to create a Topology that can be given to >>>> the >>>>>>>>>> "runner/processing-client". Thus the pattern would be >>>>>>>>>> >>>>>>>>>>> Topology topology = streams.build(); >>>>>>>>>>> KafkaStramsRunner runner = new KafkaStreamsRunner(..., topology) >>>>>>>>>> >>>>>>>>>> (or of course as a one liner). >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On the other hand, there was the idea (that we intentionally >>>> excluded >>>>>>>>>> from the KIP), to change the "client instantiation" pattern. >>>>>>>>>> >>>>>>>>>> Right now, a new client in actively instantiated (ie, by calling >>>>>>> "new") >>>>>>>>>> and the topology if provided as a constructor argument. However, >>>>>>>>>> especially for DSL (not sure if it would make sense for PAPI), >>> the >>>>> DSL >>>>>>>>>> builder could create the client for the user. >>>>>>>>>> >>>>>>>>>> Something like this: >>>>>>>>>> >>>>>>>>>>> KStreamBuilder builder = new KStreamBuilder(); >>>>>>>>>>> builder.whatever() // use the builder >>>>>>>>>>> >>>>>>>>>>> StreamsConfig config = .... >>>>>>>>>>> KafkaStreams streams = builder.getKafkaStreams(config); >>>>>>>>>> >>>>>>>>>> If we change the patter like this, the notion a the "DSL builder" >>>>>>> would >>>>>>>>>> change, as it does not create a topology anymore, but it creates >>>> the >>>>>>>>>> "processing client". This would address Jay's concern about "not >>>>>>>>>> exposing concept users don't need the understand" and would not >>>>>>> require >>>>>>>>>> to include the word "Topology" in the DSL builder class name, >>>> because >>>>>>>>>> the builder does not build a Topology anymore. >>>>>>>>>> >>>>>>>>>> I just put some names that came to my mind first hand -- did not >>>>> think >>>>>>>>>> about good names. It's just to discuss the pattern. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -Matthias >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 3/14/17 3:36 AM, Michael Noll wrote: >>>>>>>>>>> I see Jay's point, and I agree with much of it -- notably about >>>>> being >>>>>>>>>>> careful which concepts we do and do not expose, depending on >>> which >>>>>>> user >>>>>>>>>>> group / user type is affected. That said, I'm not sure yet >>>> whether >>>>>>> or >>>>>>>>>> not >>>>>>>>>>> we should get rid of "Topology" (or a similar term) in the DSL. >>>>>>>>>>> >>>>>>>>>>> For what it's worth, here's how related technologies define/name >>>>>>> their >>>>>>>>>>> "topologies" and "builders". Note that, in all cases, it's >>> about >>>>>>>>>>> constructing a logical processing plan, which then is being >>>>>>>> executed/run. >>>>>>>>>>> >>>>>>>>>>> - `Pipeline` (Google Dataflow/Apache Beam) >>>>>>>>>>> - To add a source you first instantiate the Source (e.g. >>>>>>>>>>> `TextIO.Read.from("gs://some/inputData.txt")`), >>>>>>>>>>> then attach it to your processing plan via >>>>>>>>>> `Pipeline#apply(<source>)`. >>>>>>>>>>> This setup is a bit different to our DSL because in our >>> DSL >>>>> the >>>>>>>>>>> builder does both, i.e. >>>>>>>>>>> instantiating + auto-attaching to itself. >>>>>>>>>>> - To execute the processing plan you call >>>> `Pipeline#execute()`. >>>>>>>>>>> - `StreamingContext`` (Spark): This setup is similar to our DSL. >>>>>>>>>>> - To add a source you call e.g. >>>>>>>>>>> `StreamingContext#socketTextStream("localhost", 9999)`. >>>>>>>>>>> - To execute the processing plan you call >>>>>>>>>> `StreamingContext#execute()`. >>>>>>>>>>> - `StreamExecutionEnvironment` (Flink): This setup is similar to >>>> our >>>>>>>> DSL. >>>>>>>>>>> - To add a source you call e.g. >>>>>>>>>>> `StreamExecutionEnvironment#socketTextStream("localhost", >>> 9999)`. >>>>>>>>>>> - To execute the processing plan you call >>>>>>>>>>> `StreamExecutionEnvironment#execute()`. >>>>>>>>>>> - `Graph`/`Flow` (Akka Streams), as a result of composing >>> Sources >>>> (~ >>>>>>>>>>> `KStreamBuilder.stream()`) and Sinks (~ `KStream#to()`) >>>>>>>>>>> into Flows, which are [Runnable]Graphs. >>>>>>>>>>> - You instantiate a Source directly, and then compose the >>>> Source >>>>>>>> with >>>>>>>>>>> Sinks to create a RunnableGraph: >>>>>>>>>>> see signature `Source#to[Mat2](sink: Graph[SinkShape[Out], >>>>>>>> Mat2]): >>>>>>>>>>> RunnableGraph[Mat]`. >>>>>>>>>>> - To execute the processing plan you call `Flow#run()`. >>>>>>>>>>> >>>>>>>>>>> In our DSL, in comparison, we do: >>>>>>>>>>> >>>>>>>>>>> - `KStreamBuilder` (Kafka Streams API) >>>>>>>>>>> - To add a source you call e.g. >>> `KStreamBuilder#stream("input- >>>>>>>>>> topic")`. >>>>>>>>>>> - To execute the processing plan you create a `KafkaStreams` >>>>>>>> instance >>>>>>>>>>> from `KStreamBuilder` >>>>>>>>>>> (where the builder will instantiate the topology = >>>> processing >>>>>>>> plan >>>>>>>>>> to >>>>>>>>>>> be executed), and then >>>>>>>>>>> call `KafkaStreams#start()`. Think of `KafkaStreams` as >>> our >>>>>>>>>> runner. >>>>>>>>>>> >>>>>>>>>>> First, I agree with the sentiment that the current name of >>>>>>>>>> `KStreamBuilder` >>>>>>>>>>> isn't great (which is why we're having this discussion). Also, >>>> that >>>>>>>>>>> finding a good name is tricky. ;-) >>>>>>>>>>> >>>>>>>>>>> Second, even though I agree with many of Jay's points I'm not >>> sure >>>>>>>>>> whether >>>>>>>>>>> I like the `StreamsBuilder` suggestion (i.e. any name that does >>>> not >>>>>>>>>> include >>>>>>>>>>> "topology" or a similar term) that much more. It still doesn't >>>>>>>> describe >>>>>>>>>>> what that class actually does, and what the difference to >>>>>>>> `KafkaStreams` >>>>>>>>>>> is. IMHO, the point of `KStreamBuilder` is that it lets you >>>> build a >>>>>>>>>>> logical plan (what we call "topology"), and `KafkaStreams` is >>> the >>>>>>> thing >>>>>>>>>>> that executes that plan. I'm not yet convinced that abstracting >>>>>>> these >>>>>>>>>> two >>>>>>>>>>> points away from the user is a good idea if the argument is that >>>>> it's >>>>>>>>>>> potentially confusing to beginners (a claim which I am not sure >>> is >>>>>>>>>> actually >>>>>>>>>>> true). >>>>>>>>>>> >>>>>>>>>>> That said, if we rather favor "good-sounding but perhaps less >>>>>>>> technically >>>>>>>>>>> correct names", I'd argue we should not even use something like >>>>>>>>>> "Builder". >>>>>>>>>>> We could, for example, also pick the following names: >>>>>>>>>>> >>>>>>>>>>> - KafkaStreams as the new name for the builder that creates the >>>>>>> logical >>>>>>>>>>> plan, with e.g. `KafkaStreams.stream("intput-topic")` and >>>>>>>>>>> `KafkaStreams.table("input-topic")`. >>>>>>>>>>> - KafkaStreamsRunner as the new name for the executioner of the >>>>> plan, >>>>>>>>>> with >>>>>>>>>>> `KafkaStreamsRunner(KafkaStreams).run()`. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Tue, Mar 14, 2017 at 5:56 AM, Sriram Subramanian < >>>>>>> r...@confluent.io> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> StreamsBuilder would be my vote. >>>>>>>>>>>> >>>>>>>>>>>>> On Mar 13, 2017, at 9:42 PM, Jay Kreps <j...@confluent.io> >>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Hey Matthias, >>>>>>>>>>>>> >>>>>>>>>>>>> Make sense, I'm more advocating for removing the word topology >>>>> than >>>>>>>> any >>>>>>>>>>>>> particular new replacement. >>>>>>>>>>>>> >>>>>>>>>>>>> -Jay >>>>>>>>>>>>> >>>>>>>>>>>>> On Mon, Mar 13, 2017 at 12:30 PM, Matthias J. Sax < >>>>>>>>>> matth...@confluent.io >>>>>>>>>>>>> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Jay, >>>>>>>>>>>>>> >>>>>>>>>>>>>> thanks for your feedback >>>>>>>>>>>>>> >>>>>>>>>>>>>>> What if instead we called it KStreamsBuilder? >>>>>>>>>>>>>> >>>>>>>>>>>>>> That's the current name and I personally think it's not the >>>> best >>>>>>>> one. >>>>>>>>>>>>>> The main reason why I don't like KStreamsBuilder is, that we >>>> have >>>>>>>> the >>>>>>>>>>>>>> concepts of KStreams and KTables, and the builder creates >>> both. >>>>>>>>>> However, >>>>>>>>>>>>>> the name puts he focus on KStream and devalues KTable. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I understand your argument, and I am personally open the >>> remove >>>>>>> the >>>>>>>>>>>>>> "Topology" part, and name it "StreamsBuilder". Not sure what >>>>>>> others >>>>>>>>>>>>>> think about this. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> About Processor API: I like the idea in general, but I thinks >>>>> it's >>>>>>>> out >>>>>>>>>>>>>> of scope for this KIP. KIP-120 has the focus on removing >>>> leaking >>>>>>>>>>>>>> internal APIs and do some cleanup how our API reflects some >>>>>>>> concepts. >>>>>>>>>>>>>> >>>>>>>>>>>>>> However, I added your idea to API discussion Wiki page and we >>>>> take >>>>>>>> if >>>>>>>>>>>>>> from there: >>>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/ >>>>>>>>>>>>>> Kafka+Streams+Discussions >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> -Matthias >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> On 3/13/17 11:52 AM, Jay Kreps wrote: >>>>>>>>>>>>>>> Two things: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 1. This is a minor thing but the proposed new name for >>>>>>>>>> KStreamBuilder >>>>>>>>>>>>>>> is StreamsTopologyBuilder. I actually think we should not >>>> put >>>>>>>>>>>>>> topology in >>>>>>>>>>>>>>> the name as topology is not a concept you need to >>> understand >>>>> at >>>>>>>> the >>>>>>>>>>>>>>> kstreams layer right now. I'd think of three categories of >>>>>>>>>> concepts: >>>>>>>>>>>>>> (1) >>>>>>>>>>>>>>> concepts you need to understand to get going even for a >>>> simple >>>>>>>>>>>>>> example, (2) >>>>>>>>>>>>>>> concepts you need to understand to operate and debug a >>> real >>>>>>>>>>>>>> production app, >>>>>>>>>>>>>>> (3) concepts we truly abstract and you don't need to ever >>>>>>>>>> understand. >>>>>>>>>>>>>> I >>>>>>>>>>>>>>> think in the kstream layer topologies are currently >>> category >>>>>>> (2), >>>>>>>>>> and >>>>>>>>>>>>>> this >>>>>>>>>>>>>>> is where they belong. By introducing the name in even the >>>>>>>> simplest >>>>>>>>>>>>>> example >>>>>>>>>>>>>>> it means the user has to go read about toplogies to really >>>>>>>>>> understand >>>>>>>>>>>>>> even >>>>>>>>>>>>>>> this simple snippet. What if instead we called it >>>>>>>> KStreamsBuilder? >>>>>>>>>>>>>>> 2. For the processor api, I think this api is mostly not >>> for >>>>>>> end >>>>>>>>>>>>>> users. >>>>>>>>>>>>>>> However this are a couple cases where it might make sense >>> to >>>>>>>> expose >>>>>>>>>>>>>> it. I >>>>>>>>>>>>>>> think users coming from Samza, or JMS's MessageListener ( >>>>>>>>>>>>>>> https://docs.oracle.com/javaee/7/api/javax/jms/ >>>>>>>>>> MessageListener.html) >>>>>>>>>>>>>>> understand a simple callback interface for message >>>> processing. >>>>>>> In >>>>>>>>>>>>>> fact, >>>>>>>>>>>>>>> people often ask why Kafka's consumer doesn't provide such >>>> an >>>>>>>>>>>>>> interface. >>>>>>>>>>>>>>> I'd argue we do, it's KafkaStreams. The only issue is that >>>> the >>>>>>>>>>>>>> processor >>>>>>>>>>>>>>> API documentation is a bit scary for a person implementing >>>>> this >>>>>>>>>> type >>>>>>>>>>>>>> of >>>>>>>>>>>>>>> api. My observation is that people using this style of API >>>>>>> don't >>>>>>>>>> do a >>>>>>>>>>>>>> lot >>>>>>>>>>>>>>> of cross-message operations, then just do single message >>>>>>>> operations >>>>>>>>>>>>>> and use >>>>>>>>>>>>>>> a database for anything that spans messages. They also >>> don't >>>>>>>> factor >>>>>>>>>>>>>> their >>>>>>>>>>>>>>> code into many MessageListeners and compose them, they >>> just >>>>>>> have >>>>>>>>>> one >>>>>>>>>>>>>>> listener that has the complete handling logic. Say I am a >>>> user >>>>>>>> who >>>>>>>>>>>>>> wants to >>>>>>>>>>>>>>> implement a single Processor in this style. Do we have an >>>> easy >>>>>>>> way >>>>>>>>>> to >>>>>>>>>>>>>> do >>>>>>>>>>>>>>> that today (either with the .transform/.process methods in >>>>>>>> kstreams >>>>>>>>>>>>>> or with >>>>>>>>>>>>>>> the topology apis)? Is there anything we can do in the way >>>> of >>>>>>>>>> trivial >>>>>>>>>>>>>>> helper code to make this better? Also, how can we explain >>>> that >>>>>>>>>>>>>> pattern to >>>>>>>>>>>>>>> people? I think currently we have pretty in-depth docs on >>>> our >>>>>>>> apis >>>>>>>>>>>>>> but I >>>>>>>>>>>>>>> suspect a person trying to figure out how to implement a >>>>> simple >>>>>>>>>>>>>> callback >>>>>>>>>>>>>>> might get a bit lost trying to figure out how to wire it >>>> up. A >>>>>>>>>> simple >>>>>>>>>>>>>> five >>>>>>>>>>>>>>> line example in the docs would probably help a lot. Not >>> sure >>>>> if >>>>>>>>>> this >>>>>>>>>>>>>> is >>>>>>>>>>>>>>> best addressed in this KIP or is a side comment. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -Jay >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Fri, Feb 3, 2017 at 3:33 PM, Matthias J. Sax < >>>>>>>>>> matth...@confluent.io >>>>>>>>>>>>> >>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hi All, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I did prepare a KIP to do some cleanup some of Kafka's >>>>> Streaming >>>>>>>>>> API. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Please have a look here: >>>>>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP- >>>>>>>>>>>>>>>> 120%3A+Cleanup+Kafka+Streams+builder+API >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Looking forward to your feedback! >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -Matthias >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>> >>> >>> >>> -- >>> -- Guozhang >>> >> >
signature.asc
Description: OpenPGP digital signature