Just to clarify, I did want to have the term `Topology` as part of the class name, for the reasons above. I'm not too worried about to be consistent with the previous names, but I feel the `XXTopologyBuilder` is better than `XXStreamsBuilder` since it's build() function returns a Topology object.
Guozhang On Mon, Mar 20, 2017 at 12:53 PM, Michael Noll <mich...@confluent.io> wrote: > Hmm, I must admit I don't like this last update all too much. > > Basically we would have: > > StreamsBuilder builder = new StreamsBuilder(); > > // And here you'd define your...well, what actually? > // Ah right, you are composing a topology here, though you are not > aware of it. > > KafkaStreams streams = new KafkaStreams(builder.build(), > streamsConfiguration); > > So what are you building here with StreamsBuilder? Streams (hint: No)? > And what about tables -- is there a TableBuilder (hint: No)? > > I also interpret Guozhang's last response as that he'd prefer to have > "Topology" in the class/interface names. I am aware that we shouldn't > necessarily use the status quo to make decisions about future changes, but > the very first concept we explain in the Kafka Streams documentation is > "Stream Processing Topology": > https://kafka.apache.org/0102/documentation/streams#streams_concepts > > -Michael > > > > On Mon, Mar 20, 2017 at 7:55 PM, Matthias J. Sax <matth...@confluent.io> > wrote: > > > \cc users list > > > > > > -------- Forwarded Message -------- > > Subject: Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API > > Date: Mon, 20 Mar 2017 11:51:01 -0700 > > From: Matthias J. Sax <matth...@confluent.io> > > Organization: Confluent Inc > > To: d...@kafka.apache.org > > > > I want to push this discussion further. > > > > Guozhang's argument about "exposing" the Topology class is valid. It's a > > public class anyway, so it's not as issue. However, I think the question > > is not too much about exposing but about "advertising" (ie, putting it > > into the focus) or not at DSL level. > > > > > > If I interpret the last replies correctly, it seems that we could agree > > on "StreamsBuilder" as name. I did update the KIP accordingly. Please > > correct me, if I got this wrong. > > > > > > If there are not other objects -- this naming discussion was the last > > open point to far -- I would like the start the VOTE thread. > > > > > > -Matthias > > > > > > On 3/14/17 2:37 PM, Guozhang Wang wrote: > > > I'd like to keep the term "Topology" inside the builder class since, as > > > Matthias mentioned, this builder#build() function returns a "Topology" > > > object, whose type is a public class anyways. Although you can argue to > > let > > > users always call > > > > > > "new KafkaStreams(builder.build())" > > > > > > I think it is still more benefit to expose this concept. > > > > > > > > > > > > Guozhang > > > > > > On Tue, Mar 14, 2017 at 10:43 AM, Matthias J. Sax < > matth...@confluent.io > > > > > > wrote: > > > > > >> Thanks for your input Michael. > > >> > > >>>> - KafkaStreams as the new name for the builder that creates the > > logical > > >>>> plan, with e.g. `KafkaStreams.stream("intput-topic")` and > > >>>> `KafkaStreams.table("input-topic")`. > > >> > > >> I don't thinks this is a good idea, for multiple reasons: > > >> > > >> (1) We would reuse a name for a completely different purpose. The same > > >> argument for not renaming KStreamBuilder to TopologyBuilder. The > > >> confusion would just be too large. > > >> > > >> So if we would start from scratch, it might be ok to do so, but now we > > >> cannot make this move, IMHO. > > >> > > >> Also a clarification question: do you suggest to have static methods > > >> #stream and #table -- I am not sure if this would work? > > >> (or was you code snippet just simplification?) > > >> > > >> > > >> (2) Kafka Streams is basically a "processing client" next to consumer > > >> and producer client. Thus, the name KafkaStreams aligns to the naming > > >> schema of KafkaConsumer and KafkaProducer. I am not sure if it would > be > > >> a good choice to "break" this naming scheme. > > >> > > >> Btw: this is also the reason, why we have KafkaStreams#close() -- and > > >> not KafkaStreams#stop() -- because #close() aligns with consumer and > > >> producer client. > > >> > > >> > > >> (3) On more argument against using KafkaStreams as DSL entry class > would > > >> be, that it would need to create a Topology that can be given to the > > >> "runner/processing-client". Thus the pattern would be > > >> > > >>> Topology topology = streams.build(); > > >>> KafkaStramsRunner runner = new KafkaStreamsRunner(..., topology) > > >> > > >> (or of course as a one liner). > > >> > > >> > > >> > > >> On the other hand, there was the idea (that we intentionally excluded > > >> from the KIP), to change the "client instantiation" pattern. > > >> > > >> Right now, a new client in actively instantiated (ie, by calling > "new") > > >> and the topology if provided as a constructor argument. However, > > >> especially for DSL (not sure if it would make sense for PAPI), the DSL > > >> builder could create the client for the user. > > >> > > >> Something like this: > > >> > > >>> KStreamBuilder builder = new KStreamBuilder(); > > >>> builder.whatever() // use the builder > > >>> > > >>> StreamsConfig config = .... > > >>> KafkaStreams streams = builder.getKafkaStreams(config); > > >> > > >> If we change the patter like this, the notion a the "DSL builder" > would > > >> change, as it does not create a topology anymore, but it creates the > > >> "processing client". This would address Jay's concern about "not > > >> exposing concept users don't need the understand" and would not > require > > >> to include the word "Topology" in the DSL builder class name, because > > >> the builder does not build a Topology anymore. > > >> > > >> I just put some names that came to my mind first hand -- did not think > > >> about good names. It's just to discuss the pattern. > > >> > > >> > > >> > > >> -Matthias > > >> > > >> > > >> > > >> > > >> > > >> On 3/14/17 3:36 AM, Michael Noll wrote: > > >>> I see Jay's point, and I agree with much of it -- notably about being > > >>> careful which concepts we do and do not expose, depending on which > user > > >>> group / user type is affected. That said, I'm not sure yet whether > or > > >> not > > >>> we should get rid of "Topology" (or a similar term) in the DSL. > > >>> > > >>> For what it's worth, here's how related technologies define/name > their > > >>> "topologies" and "builders". Note that, in all cases, it's about > > >>> constructing a logical processing plan, which then is being > > executed/run. > > >>> > > >>> - `Pipeline` (Google Dataflow/Apache Beam) > > >>> - To add a source you first instantiate the Source (e.g. > > >>> `TextIO.Read.from("gs://some/inputData.txt")`), > > >>> then attach it to your processing plan via > > >> `Pipeline#apply(<source>)`. > > >>> This setup is a bit different to our DSL because in our DSL the > > >>> builder does both, i.e. > > >>> instantiating + auto-attaching to itself. > > >>> - To execute the processing plan you call `Pipeline#execute()`. > > >>> - `StreamingContext`` (Spark): This setup is similar to our DSL. > > >>> - To add a source you call e.g. > > >>> `StreamingContext#socketTextStream("localhost", 9999)`. > > >>> - To execute the processing plan you call > > >> `StreamingContext#execute()`. > > >>> - `StreamExecutionEnvironment` (Flink): This setup is similar to our > > DSL. > > >>> - To add a source you call e.g. > > >>> `StreamExecutionEnvironment#socketTextStream("localhost", 9999)`. > > >>> - To execute the processing plan you call > > >>> `StreamExecutionEnvironment#execute()`. > > >>> - `Graph`/`Flow` (Akka Streams), as a result of composing Sources (~ > > >>> `KStreamBuilder.stream()`) and Sinks (~ `KStream#to()`) > > >>> into Flows, which are [Runnable]Graphs. > > >>> - You instantiate a Source directly, and then compose the Source > > with > > >>> Sinks to create a RunnableGraph: > > >>> see signature `Source#to[Mat2](sink: Graph[SinkShape[Out], > > Mat2]): > > >>> RunnableGraph[Mat]`. > > >>> - To execute the processing plan you call `Flow#run()`. > > >>> > > >>> In our DSL, in comparison, we do: > > >>> > > >>> - `KStreamBuilder` (Kafka Streams API) > > >>> - To add a source you call e.g. `KStreamBuilder#stream("input- > > >> topic")`. > > >>> - To execute the processing plan you create a `KafkaStreams` > > instance > > >>> from `KStreamBuilder` > > >>> (where the builder will instantiate the topology = processing > > plan > > >> to > > >>> be executed), and then > > >>> call `KafkaStreams#start()`. Think of `KafkaStreams` as our > > >> runner. > > >>> > > >>> First, I agree with the sentiment that the current name of > > >> `KStreamBuilder` > > >>> isn't great (which is why we're having this discussion). Also, that > > >>> finding a good name is tricky. ;-) > > >>> > > >>> Second, even though I agree with many of Jay's points I'm not sure > > >> whether > > >>> I like the `StreamsBuilder` suggestion (i.e. any name that does not > > >> include > > >>> "topology" or a similar term) that much more. It still doesn't > > describe > > >>> what that class actually does, and what the difference to > > `KafkaStreams` > > >>> is. IMHO, the point of `KStreamBuilder` is that it lets you build a > > >>> logical plan (what we call "topology"), and `KafkaStreams` is the > thing > > >>> that executes that plan. I'm not yet convinced that abstracting > these > > >> two > > >>> points away from the user is a good idea if the argument is that it's > > >>> potentially confusing to beginners (a claim which I am not sure is > > >> actually > > >>> true). > > >>> > > >>> That said, if we rather favor "good-sounding but perhaps less > > technically > > >>> correct names", I'd argue we should not even use something like > > >> "Builder". > > >>> We could, for example, also pick the following names: > > >>> > > >>> - KafkaStreams as the new name for the builder that creates the > logical > > >>> plan, with e.g. `KafkaStreams.stream("intput-topic")` and > > >>> `KafkaStreams.table("input-topic")`. > > >>> - KafkaStreamsRunner as the new name for the executioner of the plan, > > >> with > > >>> `KafkaStreamsRunner(KafkaStreams).run()`. > > >>> > > >>> > > >>> > > >>> On Tue, Mar 14, 2017 at 5:56 AM, Sriram Subramanian < > r...@confluent.io> > > >>> wrote: > > >>> > > >>>> StreamsBuilder would be my vote. > > >>>> > > >>>>> On Mar 13, 2017, at 9:42 PM, Jay Kreps <j...@confluent.io> wrote: > > >>>>> > > >>>>> Hey Matthias, > > >>>>> > > >>>>> Make sense, I'm more advocating for removing the word topology than > > any > > >>>>> particular new replacement. > > >>>>> > > >>>>> -Jay > > >>>>> > > >>>>> On Mon, Mar 13, 2017 at 12:30 PM, Matthias J. Sax < > > >> matth...@confluent.io > > >>>>> > > >>>>> wrote: > > >>>>> > > >>>>>> Jay, > > >>>>>> > > >>>>>> thanks for your feedback > > >>>>>> > > >>>>>>> What if instead we called it KStreamsBuilder? > > >>>>>> > > >>>>>> That's the current name and I personally think it's not the best > > one. > > >>>>>> The main reason why I don't like KStreamsBuilder is, that we have > > the > > >>>>>> concepts of KStreams and KTables, and the builder creates both. > > >> However, > > >>>>>> the name puts he focus on KStream and devalues KTable. > > >>>>>> > > >>>>>> I understand your argument, and I am personally open the remove > the > > >>>>>> "Topology" part, and name it "StreamsBuilder". Not sure what > others > > >>>>>> think about this. > > >>>>>> > > >>>>>> > > >>>>>> About Processor API: I like the idea in general, but I thinks it's > > out > > >>>>>> of scope for this KIP. KIP-120 has the focus on removing leaking > > >>>>>> internal APIs and do some cleanup how our API reflects some > > concepts. > > >>>>>> > > >>>>>> However, I added your idea to API discussion Wiki page and we take > > if > > >>>>>> from there: > > >>>>>> https://cwiki.apache.org/confluence/display/KAFKA/ > > >>>>>> Kafka+Streams+Discussions > > >>>>>> > > >>>>>> > > >>>>>> > > >>>>>> -Matthias > > >>>>>> > > >>>>>> > > >>>>>>> On 3/13/17 11:52 AM, Jay Kreps wrote: > > >>>>>>> Two things: > > >>>>>>> > > >>>>>>> 1. This is a minor thing but the proposed new name for > > >> KStreamBuilder > > >>>>>>> is StreamsTopologyBuilder. I actually think we should not put > > >>>>>> topology in > > >>>>>>> the name as topology is not a concept you need to understand at > > the > > >>>>>>> kstreams layer right now. I'd think of three categories of > > >> concepts: > > >>>>>> (1) > > >>>>>>> concepts you need to understand to get going even for a simple > > >>>>>> example, (2) > > >>>>>>> concepts you need to understand to operate and debug a real > > >>>>>> production app, > > >>>>>>> (3) concepts we truly abstract and you don't need to ever > > >> understand. > > >>>>>> I > > >>>>>>> think in the kstream layer topologies are currently category > (2), > > >> and > > >>>>>> this > > >>>>>>> is where they belong. By introducing the name in even the > > simplest > > >>>>>> example > > >>>>>>> it means the user has to go read about toplogies to really > > >> understand > > >>>>>> even > > >>>>>>> this simple snippet. What if instead we called it > > KStreamsBuilder? > > >>>>>>> 2. For the processor api, I think this api is mostly not for > end > > >>>>>> users. > > >>>>>>> However this are a couple cases where it might make sense to > > expose > > >>>>>> it. I > > >>>>>>> think users coming from Samza, or JMS's MessageListener ( > > >>>>>>> https://docs.oracle.com/javaee/7/api/javax/jms/ > > >> MessageListener.html) > > >>>>>>> understand a simple callback interface for message processing. > In > > >>>>>> fact, > > >>>>>>> people often ask why Kafka's consumer doesn't provide such an > > >>>>>> interface. > > >>>>>>> I'd argue we do, it's KafkaStreams. The only issue is that the > > >>>>>> processor > > >>>>>>> API documentation is a bit scary for a person implementing this > > >> type > > >>>>>> of > > >>>>>>> api. My observation is that people using this style of API > don't > > >> do a > > >>>>>> lot > > >>>>>>> of cross-message operations, then just do single message > > operations > > >>>>>> and use > > >>>>>>> a database for anything that spans messages. They also don't > > factor > > >>>>>> their > > >>>>>>> code into many MessageListeners and compose them, they just > have > > >> one > > >>>>>>> listener that has the complete handling logic. Say I am a user > > who > > >>>>>> wants to > > >>>>>>> implement a single Processor in this style. Do we have an easy > > way > > >> to > > >>>>>> do > > >>>>>>> that today (either with the .transform/.process methods in > > kstreams > > >>>>>> or with > > >>>>>>> the topology apis)? Is there anything we can do in the way of > > >> trivial > > >>>>>>> helper code to make this better? Also, how can we explain that > > >>>>>> pattern to > > >>>>>>> people? I think currently we have pretty in-depth docs on our > > apis > > >>>>>> but I > > >>>>>>> suspect a person trying to figure out how to implement a simple > > >>>>>> callback > > >>>>>>> might get a bit lost trying to figure out how to wire it up. A > > >> simple > > >>>>>> five > > >>>>>>> line example in the docs would probably help a lot. Not sure if > > >> this > > >>>>>> is > > >>>>>>> best addressed in this KIP or is a side comment. > > >>>>>>> > > >>>>>>> Cheers, > > >>>>>>> > > >>>>>>> -Jay > > >>>>>>> > > >>>>>>> On Fri, Feb 3, 2017 at 3:33 PM, Matthias J. Sax < > > >> matth...@confluent.io > > >>>>> > > >>>>>>> wrote: > > >>>>>>> > > >>>>>>>> Hi All, > > >>>>>>>> > > >>>>>>>> I did prepare a KIP to do some cleanup some of Kafka's Streaming > > >> API. > > >>>>>>>> > > >>>>>>>> Please have a look here: > > >>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP- > > >>>>>>>> 120%3A+Cleanup+Kafka+Streams+builder+API > > >>>>>>>> > > >>>>>>>> Looking forward to your feedback! > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> -Matthias > > >>>>>> > > >>>>>> > > >>>> > > >>> > > >> > > >> > > > > > > > > > > > > > > > -- -- Guozhang