Jay, about the naming schema:
>> 1. "kstreams" - the DSL >> 2. "processor api" - the lower level callback/topology api >> 3. KStream/KTable - entities in the kstreams dsl >> 4. "Kafka Streams" - General name for stream processing stuff in Kafka, >> including both kstreams and the processor API plus the underlying >> implementation. It think this terminology has some issues... To me, `kstreams` was always not more than an abbreviation for `Kafka Streams` -- thus (1) and (4) kinda collide here. Following questions on the mailing list etc I often see people using kstreams or kstream exactly a abbr. for "Kafka Streams" > I think referring to the dsl as "kstreams" is cute and pneumonic and not > particularly confusing. I disagree here. It's a very subtle difference between `kstreams` and `KStream` -- just singular/plural, thus (1) and (3) also "collide" -- it's just too close to each other. Thus, I really think it's a good idea to get a new name for the DSL to get a better separation of the 4 concepts. Furthermore, we use the term "Streams API". Thus, I think `StreamsBuilder` (or `StreamsTopologyBuilder`) are both very good names. Thus, I prefer to keep the KIP as is (suggesting `StreamsBuilder`). I will start a VOTE thread. Of course, we can still discuss the naming issue. :) -Matthias On 3/22/17 8:53 PM, Jay Kreps wrote: > I don't feel strongly on this, so I'm happy with whatever everyone else > wants. > > Michael, I'm not arguing that people don't need to understand topologies, I > just think it is like rocks db, you need to understand it when > debugging/operating but not in the initial coding since the metaphor we're > providing at this layer isn't a topology of processors but rather something > like the collections api. Anyhow it won't hurt people to have it there. > > For the original KStreamBuilder thing, I think that came from the naming we > discussed originally: > > 1. "kstreams" - the DSL > 2. "processor api" - the lower level callback/topology api > 3. KStream/KTable - entities in the kstreams dsl > 4. "Kafka Streams" - General name for stream processing stuff in Kafka, > including both kstreams and the processor API plus the underlying > implementation. > > I think referring to the dsl as "kstreams" is cute and pneumonic and not > particularly confusing. Just like referring to the "java collections > library" isn't confusing even though it contains the Iterator interface > which is not actually itself a collection. > > So I think KStreamBuilder should technically have been KstreamsBuilder and > is intended not to be a builder of a KStream but rather the builder for the > kstreams DSL. Okay, yes, that *is* slightly confusing. :-) > > -Jay > > On Wed, Mar 22, 2017 at 11:25 AM, Guozhang Wang <wangg...@gmail.com> wrote: > >> Regarding the naming of `StreamsTopologyBuilder` v.s. `StreamsBuilder` that >> are going to be used in DSL, I agree both has their arguments: >> >> 1. On one side, people using the DSL layer probably do not need to be aware >> (or rather, "learn about") of the "topology" concept, although this concept >> is a publicly exposed one in Kafka Streams. >> >> 2. On the other side, StreamsBuilder#build() returning a Topology object >> sounds a little weird, at least to me (admittedly subjective matter). >> >> >> Since the second bullet point seems to be more "subjective" and many people >> are not worried about it, I'm OK to go with the other option. >> >> >> Guozhang >> >> >> On Wed, Mar 22, 2017 at 8:58 AM, Michael Noll <mich...@confluent.io> >> wrote: >> >>> Forwarding to kafka-user. >>> >>> >>> ---------- Forwarded message ---------- >>> From: Michael Noll <mich...@confluent.io> >>> Date: Wed, Mar 22, 2017 at 8:48 AM >>> Subject: Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API >>> To: d...@kafka.apache.org >>> >>> >>> Matthias, >>> >>>> @Michael: >>>> >>>> You seemed to agree with Jay about not exposing the `Topology` concept >>>> in our main entry class (ie, current KStreamBuilder), thus, I >>>> interpreted that you do not want `Topology` in the name either (I am a >>>> little surprised by your last response, that goes the opposite >>> direction). >>> >>> Oh, sorry for not being clear. >>> >>> What I wanted to say in my earlier email was the following: Yes, I do >>> agree with most of Jay's reasoning, notably about carefully deciding how >>> much and which parts of the API/concept "surface" we expose to users of >> the >>> DSL. However, and this is perhaps where I wasn't very clear, I disagree >> on >>> the particular opinion about not exposing the topology concept to DSL >>> users. Instead, I think the concept of a topology is important to >>> understand even for DSL users -- particularly because of the way the DSL >> is >>> currently wiring your processing logic via the builder pattern. (As I >>> noted, e.g. Akka uses a different approach where you might be able to get >>> away with not exposing the "topology" concept, but even in Akka there's >> the >>> notion of graphs and flows.) >>> >>> >>>>> StreamsBuilder builder = new StreamsBuilder(); >>>>> >>>>> // And here you'd define your...well, what actually? >>>>> // Ah right, you are composing a topology here, though you are >> not >>>>> aware of it. >>>> >>>> Yes. You are not aware of if -- that's the whole point about it -- >> don't >>>> put the Topology concept in the focus... >>> >>> Let me turn this around, because that was my point: it's confusing to >> have >>> a name "StreamsBuilder" if that thing isn't building streams, and it is >>> not. >>> >>> As I mentioned before, I do think it is a benefit to make it clear to DSL >>> users that there are two aspects at play: (1) defining the logic/plan of >>> your processing, and (2) the execution of that plan. I have a less >> strong >>> opinion whether or not having "topology" in the names would help to >>> communicate this separation as well as combination of (1) and (2) to make >>> your app work as expected. >>> >>> If we stick with `KafkaStreams` for (2) *and* don't like having >> "topology" >>> in the name, then perhaps we should rename `KStreamBuilder` to >>> `KafkaStreamsBuilder`. That at least gives some illusion of a combo of >> (1) >>> and (2). IMHO, `KafkaStreamsBuilder` highlights better that "it is a >>> builder/helper for the Kafka Streams API", rather than "a builder for >>> streams". >>> >>> Also, I think some of the naming challenges we're discussing here are >>> caused by having this builder pattern in the first place. If the Streams >>> API was implemented in Scala, for example, we could use implicits for >>> helping us to "stitch streams/tables together to build the full >> topology", >>> thus using a different (better?) approach to composing your topologies >> that >>> through a builder pattern. So: perhaps there's a better way then the >>> builder, and that way would also be clearer on terminology? That said, >>> this might take this KIP off-scope. >>> >>> -Michael >>> >>> >>> >>> >>> On Wed, Mar 22, 2017 at 12:33 AM, Matthias J. Sax <matth...@confluent.io >>> >>> wrote: >>> >>>> @Guozhang: >>>> >>>> I recognized that you want to have `Topology` in the name. But it seems >>>> that more people preferred to not have it (Jay, Ram, Michael [?], >>> myself). >>>> >>>> @Michael: >>>> >>>> You seemed to agree with Jay about not exposing the `Topology` concept >>>> in our main entry class (ie, current KStreamBuilder), thus, I >>>> interpreted that you do not want `Topology` in the name either (I am a >>>> little surprised by your last response, that goes the opposite >>> direction). >>>> >>>>> StreamsBuilder builder = new StreamsBuilder(); >>>>> >>>>> // And here you'd define your...well, what actually? >>>>> // Ah right, you are composing a topology here, though you are >> not >>>>> aware of it. >>>> >>>> Yes. You are not aware of if -- that's the whole point about it -- >> don't >>>> put the Topology concept in the focus... >>>> >>>> Furthermore, >>>> >>>>>>> So what are you building here with StreamsBuilder? Streams (hint: >>> No)? >>>>>>> And what about tables -- is there a TableBuilder (hint: No)? >>>> >>>> I am not sure, if this is too much a concern. In contrast to >>>> `KStreamBuilder` (singular) that contains `KStream` and thus puts >>>> KStream concept in focus and thus degrade `KTable`, `StreamsBuilder` >>>> (plural) focuses on "Streams API". IMHO, it does not put focus on >>>> KStream. It's just a builder from the Streams API -- you don't need to >>>> worry what you are building -- and you don't need to think about the >>>> `Topology` concept (of course, you see that .build() return a >> Topology). >>>> >>>> >>>> Personally, I see pros and cons for both `StreamsBuilder` and >>>> `StreamsTopologyBuilder` and thus, I am fine either way. Maybe Jay and >>>> Ram can follow up and share their thoughts? >>>> >>>> I would also help a lot if other people put their vote for a name, too. >>>> >>>> >>>> >>>> -Matthias >>>> >>>> >>>> >>>> On 3/21/17 2:11 PM, Guozhang Wang wrote: >>>>> Just to clarify, I did want to have the term `Topology` as part of >> the >>>>> class name, for the reasons above. I'm not too worried about to be >>>>> consistent with the previous names, but I feel the >> `XXTopologyBuilder` >>> is >>>>> better than `XXStreamsBuilder` since it's build() function returns a >>>>> Topology object. >>>>> >>>>> >>>>> Guozhang >>>>> >>>>> >>>>> On Mon, Mar 20, 2017 at 12:53 PM, Michael Noll <mich...@confluent.io >>> >>>> wrote: >>>>> >>>>>> Hmm, I must admit I don't like this last update all too much. >>>>>> >>>>>> Basically we would have: >>>>>> >>>>>> StreamsBuilder builder = new StreamsBuilder(); >>>>>> >>>>>> // And here you'd define your...well, what actually? >>>>>> // Ah right, you are composing a topology here, though you are >> not >>>>>> aware of it. >>>>>> >>>>>> KafkaStreams streams = new KafkaStreams(builder.build(), >>>>>> streamsConfiguration); >>>>>> >>>>>> So what are you building here with StreamsBuilder? Streams (hint: >>> No)? >>>>>> And what about tables -- is there a TableBuilder (hint: No)? >>>>>> >>>>>> I also interpret Guozhang's last response as that he'd prefer to >> have >>>>>> "Topology" in the class/interface names. I am aware that we >> shouldn't >>>>>> necessarily use the status quo to make decisions about future >> changes, >>>> but >>>>>> the very first concept we explain in the Kafka Streams documentation >>> is >>>>>> "Stream Processing Topology": >>>>>> https://kafka.apache.org/0102/documentation/streams#streams_ >> concepts >>>>>> >>>>>> -Michael >>>>>> >>>>>> >>>>>> >>>>>> On Mon, Mar 20, 2017 at 7:55 PM, Matthias J. Sax < >>> matth...@confluent.io >>>>> >>>>>> wrote: >>>>>> >>>>>>> \cc users list >>>>>>> >>>>>>> >>>>>>> -------- Forwarded Message -------- >>>>>>> Subject: Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API >>>>>>> Date: Mon, 20 Mar 2017 11:51:01 -0700 >>>>>>> From: Matthias J. Sax <matth...@confluent.io> >>>>>>> Organization: Confluent Inc >>>>>>> To: d...@kafka.apache.org >>>>>>> >>>>>>> I want to push this discussion further. >>>>>>> >>>>>>> Guozhang's argument about "exposing" the Topology class is valid. >>> It's >>>> a >>>>>>> public class anyway, so it's not as issue. However, I think the >>>> question >>>>>>> is not too much about exposing but about "advertising" (ie, putting >>> it >>>>>>> into the focus) or not at DSL level. >>>>>>> >>>>>>> >>>>>>> If I interpret the last replies correctly, it seems that we could >>> agree >>>>>>> on "StreamsBuilder" as name. I did update the KIP accordingly. >> Please >>>>>>> correct me, if I got this wrong. >>>>>>> >>>>>>> >>>>>>> If there are not other objects -- this naming discussion was the >> last >>>>>>> open point to far -- I would like the start the VOTE thread. >>>>>>> >>>>>>> >>>>>>> -Matthias >>>>>>> >>>>>>> >>>>>>> On 3/14/17 2:37 PM, Guozhang Wang wrote: >>>>>>>> I'd like to keep the term "Topology" inside the builder class >> since, >>>> as >>>>>>>> Matthias mentioned, this builder#build() function returns a >>> "Topology" >>>>>>>> object, whose type is a public class anyways. Although you can >> argue >>>> to >>>>>>> let >>>>>>>> users always call >>>>>>>> >>>>>>>> "new KafkaStreams(builder.build())" >>>>>>>> >>>>>>>> I think it is still more benefit to expose this concept. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Guozhang >>>>>>>> >>>>>>>> On Tue, Mar 14, 2017 at 10:43 AM, Matthias J. Sax < >>>>>> matth...@confluent.io >>>>>>>> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Thanks for your input Michael. >>>>>>>>> >>>>>>>>>>> - KafkaStreams as the new name for the builder that creates the >>>>>>> logical >>>>>>>>>>> plan, with e.g. `KafkaStreams.stream("intput-topic")` and >>>>>>>>>>> `KafkaStreams.table("input-topic")`. >>>>>>>>> >>>>>>>>> I don't thinks this is a good idea, for multiple reasons: >>>>>>>>> >>>>>>>>> (1) We would reuse a name for a completely different purpose. The >>>> same >>>>>>>>> argument for not renaming KStreamBuilder to TopologyBuilder. The >>>>>>>>> confusion would just be too large. >>>>>>>>> >>>>>>>>> So if we would start from scratch, it might be ok to do so, but >> now >>>> we >>>>>>>>> cannot make this move, IMHO. >>>>>>>>> >>>>>>>>> Also a clarification question: do you suggest to have static >>> methods >>>>>>>>> #stream and #table -- I am not sure if this would work? >>>>>>>>> (or was you code snippet just simplification?) >>>>>>>>> >>>>>>>>> >>>>>>>>> (2) Kafka Streams is basically a "processing client" next to >>> consumer >>>>>>>>> and producer client. Thus, the name KafkaStreams aligns to the >>> naming >>>>>>>>> schema of KafkaConsumer and KafkaProducer. I am not sure if it >>> would >>>>>> be >>>>>>>>> a good choice to "break" this naming scheme. >>>>>>>>> >>>>>>>>> Btw: this is also the reason, why we have KafkaStreams#close() -- >>> and >>>>>>>>> not KafkaStreams#stop() -- because #close() aligns with consumer >>> and >>>>>>>>> producer client. >>>>>>>>> >>>>>>>>> >>>>>>>>> (3) On more argument against using KafkaStreams as DSL entry >> class >>>>>> would >>>>>>>>> be, that it would need to create a Topology that can be given to >>> the >>>>>>>>> "runner/processing-client". Thus the pattern would be >>>>>>>>> >>>>>>>>>> Topology topology = streams.build(); >>>>>>>>>> KafkaStramsRunner runner = new KafkaStreamsRunner(..., topology) >>>>>>>>> >>>>>>>>> (or of course as a one liner). >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On the other hand, there was the idea (that we intentionally >>> excluded >>>>>>>>> from the KIP), to change the "client instantiation" pattern. >>>>>>>>> >>>>>>>>> Right now, a new client in actively instantiated (ie, by calling >>>>>> "new") >>>>>>>>> and the topology if provided as a constructor argument. However, >>>>>>>>> especially for DSL (not sure if it would make sense for PAPI), >> the >>>> DSL >>>>>>>>> builder could create the client for the user. >>>>>>>>> >>>>>>>>> Something like this: >>>>>>>>> >>>>>>>>>> KStreamBuilder builder = new KStreamBuilder(); >>>>>>>>>> builder.whatever() // use the builder >>>>>>>>>> >>>>>>>>>> StreamsConfig config = .... >>>>>>>>>> KafkaStreams streams = builder.getKafkaStreams(config); >>>>>>>>> >>>>>>>>> If we change the patter like this, the notion a the "DSL builder" >>>>>> would >>>>>>>>> change, as it does not create a topology anymore, but it creates >>> the >>>>>>>>> "processing client". This would address Jay's concern about "not >>>>>>>>> exposing concept users don't need the understand" and would not >>>>>> require >>>>>>>>> to include the word "Topology" in the DSL builder class name, >>> because >>>>>>>>> the builder does not build a Topology anymore. >>>>>>>>> >>>>>>>>> I just put some names that came to my mind first hand -- did not >>>> think >>>>>>>>> about good names. It's just to discuss the pattern. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -Matthias >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On 3/14/17 3:36 AM, Michael Noll wrote: >>>>>>>>>> I see Jay's point, and I agree with much of it -- notably about >>>> being >>>>>>>>>> careful which concepts we do and do not expose, depending on >> which >>>>>> user >>>>>>>>>> group / user type is affected. That said, I'm not sure yet >>> whether >>>>>> or >>>>>>>>> not >>>>>>>>>> we should get rid of "Topology" (or a similar term) in the DSL. >>>>>>>>>> >>>>>>>>>> For what it's worth, here's how related technologies define/name >>>>>> their >>>>>>>>>> "topologies" and "builders". Note that, in all cases, it's >> about >>>>>>>>>> constructing a logical processing plan, which then is being >>>>>>> executed/run. >>>>>>>>>> >>>>>>>>>> - `Pipeline` (Google Dataflow/Apache Beam) >>>>>>>>>> - To add a source you first instantiate the Source (e.g. >>>>>>>>>> `TextIO.Read.from("gs://some/inputData.txt")`), >>>>>>>>>> then attach it to your processing plan via >>>>>>>>> `Pipeline#apply(<source>)`. >>>>>>>>>> This setup is a bit different to our DSL because in our >> DSL >>>> the >>>>>>>>>> builder does both, i.e. >>>>>>>>>> instantiating + auto-attaching to itself. >>>>>>>>>> - To execute the processing plan you call >>> `Pipeline#execute()`. >>>>>>>>>> - `StreamingContext`` (Spark): This setup is similar to our DSL. >>>>>>>>>> - To add a source you call e.g. >>>>>>>>>> `StreamingContext#socketTextStream("localhost", 9999)`. >>>>>>>>>> - To execute the processing plan you call >>>>>>>>> `StreamingContext#execute()`. >>>>>>>>>> - `StreamExecutionEnvironment` (Flink): This setup is similar to >>> our >>>>>>> DSL. >>>>>>>>>> - To add a source you call e.g. >>>>>>>>>> `StreamExecutionEnvironment#socketTextStream("localhost", >> 9999)`. >>>>>>>>>> - To execute the processing plan you call >>>>>>>>>> `StreamExecutionEnvironment#execute()`. >>>>>>>>>> - `Graph`/`Flow` (Akka Streams), as a result of composing >> Sources >>> (~ >>>>>>>>>> `KStreamBuilder.stream()`) and Sinks (~ `KStream#to()`) >>>>>>>>>> into Flows, which are [Runnable]Graphs. >>>>>>>>>> - You instantiate a Source directly, and then compose the >>> Source >>>>>>> with >>>>>>>>>> Sinks to create a RunnableGraph: >>>>>>>>>> see signature `Source#to[Mat2](sink: Graph[SinkShape[Out], >>>>>>> Mat2]): >>>>>>>>>> RunnableGraph[Mat]`. >>>>>>>>>> - To execute the processing plan you call `Flow#run()`. >>>>>>>>>> >>>>>>>>>> In our DSL, in comparison, we do: >>>>>>>>>> >>>>>>>>>> - `KStreamBuilder` (Kafka Streams API) >>>>>>>>>> - To add a source you call e.g. >> `KStreamBuilder#stream("input- >>>>>>>>> topic")`. >>>>>>>>>> - To execute the processing plan you create a `KafkaStreams` >>>>>>> instance >>>>>>>>>> from `KStreamBuilder` >>>>>>>>>> (where the builder will instantiate the topology = >>> processing >>>>>>> plan >>>>>>>>> to >>>>>>>>>> be executed), and then >>>>>>>>>> call `KafkaStreams#start()`. Think of `KafkaStreams` as >> our >>>>>>>>> runner. >>>>>>>>>> >>>>>>>>>> First, I agree with the sentiment that the current name of >>>>>>>>> `KStreamBuilder` >>>>>>>>>> isn't great (which is why we're having this discussion). Also, >>> that >>>>>>>>>> finding a good name is tricky. ;-) >>>>>>>>>> >>>>>>>>>> Second, even though I agree with many of Jay's points I'm not >> sure >>>>>>>>> whether >>>>>>>>>> I like the `StreamsBuilder` suggestion (i.e. any name that does >>> not >>>>>>>>> include >>>>>>>>>> "topology" or a similar term) that much more. It still doesn't >>>>>>> describe >>>>>>>>>> what that class actually does, and what the difference to >>>>>>> `KafkaStreams` >>>>>>>>>> is. IMHO, the point of `KStreamBuilder` is that it lets you >>> build a >>>>>>>>>> logical plan (what we call "topology"), and `KafkaStreams` is >> the >>>>>> thing >>>>>>>>>> that executes that plan. I'm not yet convinced that abstracting >>>>>> these >>>>>>>>> two >>>>>>>>>> points away from the user is a good idea if the argument is that >>>> it's >>>>>>>>>> potentially confusing to beginners (a claim which I am not sure >> is >>>>>>>>> actually >>>>>>>>>> true). >>>>>>>>>> >>>>>>>>>> That said, if we rather favor "good-sounding but perhaps less >>>>>>> technically >>>>>>>>>> correct names", I'd argue we should not even use something like >>>>>>>>> "Builder". >>>>>>>>>> We could, for example, also pick the following names: >>>>>>>>>> >>>>>>>>>> - KafkaStreams as the new name for the builder that creates the >>>>>> logical >>>>>>>>>> plan, with e.g. `KafkaStreams.stream("intput-topic")` and >>>>>>>>>> `KafkaStreams.table("input-topic")`. >>>>>>>>>> - KafkaStreamsRunner as the new name for the executioner of the >>>> plan, >>>>>>>>> with >>>>>>>>>> `KafkaStreamsRunner(KafkaStreams).run()`. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Tue, Mar 14, 2017 at 5:56 AM, Sriram Subramanian < >>>>>> r...@confluent.io> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> StreamsBuilder would be my vote. >>>>>>>>>>> >>>>>>>>>>>> On Mar 13, 2017, at 9:42 PM, Jay Kreps <j...@confluent.io> >>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> Hey Matthias, >>>>>>>>>>>> >>>>>>>>>>>> Make sense, I'm more advocating for removing the word topology >>>> than >>>>>>> any >>>>>>>>>>>> particular new replacement. >>>>>>>>>>>> >>>>>>>>>>>> -Jay >>>>>>>>>>>> >>>>>>>>>>>> On Mon, Mar 13, 2017 at 12:30 PM, Matthias J. Sax < >>>>>>>>> matth...@confluent.io >>>>>>>>>>>> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Jay, >>>>>>>>>>>>> >>>>>>>>>>>>> thanks for your feedback >>>>>>>>>>>>> >>>>>>>>>>>>>> What if instead we called it KStreamsBuilder? >>>>>>>>>>>>> >>>>>>>>>>>>> That's the current name and I personally think it's not the >>> best >>>>>>> one. >>>>>>>>>>>>> The main reason why I don't like KStreamsBuilder is, that we >>> have >>>>>>> the >>>>>>>>>>>>> concepts of KStreams and KTables, and the builder creates >> both. >>>>>>>>> However, >>>>>>>>>>>>> the name puts he focus on KStream and devalues KTable. >>>>>>>>>>>>> >>>>>>>>>>>>> I understand your argument, and I am personally open the >> remove >>>>>> the >>>>>>>>>>>>> "Topology" part, and name it "StreamsBuilder". Not sure what >>>>>> others >>>>>>>>>>>>> think about this. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> About Processor API: I like the idea in general, but I thinks >>>> it's >>>>>>> out >>>>>>>>>>>>> of scope for this KIP. KIP-120 has the focus on removing >>> leaking >>>>>>>>>>>>> internal APIs and do some cleanup how our API reflects some >>>>>>> concepts. >>>>>>>>>>>>> >>>>>>>>>>>>> However, I added your idea to API discussion Wiki page and we >>>> take >>>>>>> if >>>>>>>>>>>>> from there: >>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/ >>>>>>>>>>>>> Kafka+Streams+Discussions >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -Matthias >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> On 3/13/17 11:52 AM, Jay Kreps wrote: >>>>>>>>>>>>>> Two things: >>>>>>>>>>>>>> >>>>>>>>>>>>>> 1. This is a minor thing but the proposed new name for >>>>>>>>> KStreamBuilder >>>>>>>>>>>>>> is StreamsTopologyBuilder. I actually think we should not >>> put >>>>>>>>>>>>> topology in >>>>>>>>>>>>>> the name as topology is not a concept you need to >> understand >>>> at >>>>>>> the >>>>>>>>>>>>>> kstreams layer right now. I'd think of three categories of >>>>>>>>> concepts: >>>>>>>>>>>>> (1) >>>>>>>>>>>>>> concepts you need to understand to get going even for a >>> simple >>>>>>>>>>>>> example, (2) >>>>>>>>>>>>>> concepts you need to understand to operate and debug a >> real >>>>>>>>>>>>> production app, >>>>>>>>>>>>>> (3) concepts we truly abstract and you don't need to ever >>>>>>>>> understand. >>>>>>>>>>>>> I >>>>>>>>>>>>>> think in the kstream layer topologies are currently >> category >>>>>> (2), >>>>>>>>> and >>>>>>>>>>>>> this >>>>>>>>>>>>>> is where they belong. By introducing the name in even the >>>>>>> simplest >>>>>>>>>>>>> example >>>>>>>>>>>>>> it means the user has to go read about toplogies to really >>>>>>>>> understand >>>>>>>>>>>>> even >>>>>>>>>>>>>> this simple snippet. What if instead we called it >>>>>>> KStreamsBuilder? >>>>>>>>>>>>>> 2. For the processor api, I think this api is mostly not >> for >>>>>> end >>>>>>>>>>>>> users. >>>>>>>>>>>>>> However this are a couple cases where it might make sense >> to >>>>>>> expose >>>>>>>>>>>>> it. I >>>>>>>>>>>>>> think users coming from Samza, or JMS's MessageListener ( >>>>>>>>>>>>>> https://docs.oracle.com/javaee/7/api/javax/jms/ >>>>>>>>> MessageListener.html) >>>>>>>>>>>>>> understand a simple callback interface for message >>> processing. >>>>>> In >>>>>>>>>>>>> fact, >>>>>>>>>>>>>> people often ask why Kafka's consumer doesn't provide such >>> an >>>>>>>>>>>>> interface. >>>>>>>>>>>>>> I'd argue we do, it's KafkaStreams. The only issue is that >>> the >>>>>>>>>>>>> processor >>>>>>>>>>>>>> API documentation is a bit scary for a person implementing >>>> this >>>>>>>>> type >>>>>>>>>>>>> of >>>>>>>>>>>>>> api. My observation is that people using this style of API >>>>>> don't >>>>>>>>> do a >>>>>>>>>>>>> lot >>>>>>>>>>>>>> of cross-message operations, then just do single message >>>>>>> operations >>>>>>>>>>>>> and use >>>>>>>>>>>>>> a database for anything that spans messages. They also >> don't >>>>>>> factor >>>>>>>>>>>>> their >>>>>>>>>>>>>> code into many MessageListeners and compose them, they >> just >>>>>> have >>>>>>>>> one >>>>>>>>>>>>>> listener that has the complete handling logic. Say I am a >>> user >>>>>>> who >>>>>>>>>>>>> wants to >>>>>>>>>>>>>> implement a single Processor in this style. Do we have an >>> easy >>>>>>> way >>>>>>>>> to >>>>>>>>>>>>> do >>>>>>>>>>>>>> that today (either with the .transform/.process methods in >>>>>>> kstreams >>>>>>>>>>>>> or with >>>>>>>>>>>>>> the topology apis)? Is there anything we can do in the way >>> of >>>>>>>>> trivial >>>>>>>>>>>>>> helper code to make this better? Also, how can we explain >>> that >>>>>>>>>>>>> pattern to >>>>>>>>>>>>>> people? I think currently we have pretty in-depth docs on >>> our >>>>>>> apis >>>>>>>>>>>>> but I >>>>>>>>>>>>>> suspect a person trying to figure out how to implement a >>>> simple >>>>>>>>>>>>> callback >>>>>>>>>>>>>> might get a bit lost trying to figure out how to wire it >>> up. A >>>>>>>>> simple >>>>>>>>>>>>> five >>>>>>>>>>>>>> line example in the docs would probably help a lot. Not >> sure >>>> if >>>>>>>>> this >>>>>>>>>>>>> is >>>>>>>>>>>>>> best addressed in this KIP or is a side comment. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>> >>>>>>>>>>>>>> -Jay >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, Feb 3, 2017 at 3:33 PM, Matthias J. Sax < >>>>>>>>> matth...@confluent.io >>>>>>>>>>>> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi All, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I did prepare a KIP to do some cleanup some of Kafka's >>>> Streaming >>>>>>>>> API. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Please have a look here: >>>>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP- >>>>>>>>>>>>>>> 120%3A+Cleanup+Kafka+Streams+builder+API >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Looking forward to your feedback! >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -Matthias >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>>> >>>> >>>> >>> >> >> >> >> -- >> -- Guozhang >> >
signature.asc
Description: OpenPGP digital signature