I'd like to keep the term "Topology" inside the builder class since, as Matthias mentioned, this builder#build() function returns a "Topology" object, whose type is a public class anyways. Although you can argue to let users always call
"new KafkaStreams(builder.build())" I think it is still more benefit to expose this concept. Guozhang On Tue, Mar 14, 2017 at 10:43 AM, Matthias J. Sax <matth...@confluent.io> wrote: > Thanks for your input Michael. > > >> - KafkaStreams as the new name for the builder that creates the logical > >> plan, with e.g. `KafkaStreams.stream("intput-topic")` and > >> `KafkaStreams.table("input-topic")`. > > I don't thinks this is a good idea, for multiple reasons: > > (1) We would reuse a name for a completely different purpose. The same > argument for not renaming KStreamBuilder to TopologyBuilder. The > confusion would just be too large. > > So if we would start from scratch, it might be ok to do so, but now we > cannot make this move, IMHO. > > Also a clarification question: do you suggest to have static methods > #stream and #table -- I am not sure if this would work? > (or was you code snippet just simplification?) > > > (2) Kafka Streams is basically a "processing client" next to consumer > and producer client. Thus, the name KafkaStreams aligns to the naming > schema of KafkaConsumer and KafkaProducer. I am not sure if it would be > a good choice to "break" this naming scheme. > > Btw: this is also the reason, why we have KafkaStreams#close() -- and > not KafkaStreams#stop() -- because #close() aligns with consumer and > producer client. > > > (3) On more argument against using KafkaStreams as DSL entry class would > be, that it would need to create a Topology that can be given to the > "runner/processing-client". Thus the pattern would be > > > Topology topology = streams.build(); > > KafkaStramsRunner runner = new KafkaStreamsRunner(..., topology) > > (or of course as a one liner). > > > > On the other hand, there was the idea (that we intentionally excluded > from the KIP), to change the "client instantiation" pattern. > > Right now, a new client in actively instantiated (ie, by calling "new") > and the topology if provided as a constructor argument. However, > especially for DSL (not sure if it would make sense for PAPI), the DSL > builder could create the client for the user. > > Something like this: > > > KStreamBuilder builder = new KStreamBuilder(); > > builder.whatever() // use the builder > > > > StreamsConfig config = .... > > KafkaStreams streams = builder.getKafkaStreams(config); > > If we change the patter like this, the notion a the "DSL builder" would > change, as it does not create a topology anymore, but it creates the > "processing client". This would address Jay's concern about "not > exposing concept users don't need the understand" and would not require > to include the word "Topology" in the DSL builder class name, because > the builder does not build a Topology anymore. > > I just put some names that came to my mind first hand -- did not think > about good names. It's just to discuss the pattern. > > > > -Matthias > > > > > > On 3/14/17 3:36 AM, Michael Noll wrote: > > I see Jay's point, and I agree with much of it -- notably about being > > careful which concepts we do and do not expose, depending on which user > > group / user type is affected. That said, I'm not sure yet whether or > not > > we should get rid of "Topology" (or a similar term) in the DSL. > > > > For what it's worth, here's how related technologies define/name their > > "topologies" and "builders". Note that, in all cases, it's about > > constructing a logical processing plan, which then is being executed/run. > > > > - `Pipeline` (Google Dataflow/Apache Beam) > > - To add a source you first instantiate the Source (e.g. > > `TextIO.Read.from("gs://some/inputData.txt")`), > > then attach it to your processing plan via > `Pipeline#apply(<source>)`. > > This setup is a bit different to our DSL because in our DSL the > > builder does both, i.e. > > instantiating + auto-attaching to itself. > > - To execute the processing plan you call `Pipeline#execute()`. > > - `StreamingContext`` (Spark): This setup is similar to our DSL. > > - To add a source you call e.g. > > `StreamingContext#socketTextStream("localhost", 9999)`. > > - To execute the processing plan you call > `StreamingContext#execute()`. > > - `StreamExecutionEnvironment` (Flink): This setup is similar to our DSL. > > - To add a source you call e.g. > > `StreamExecutionEnvironment#socketTextStream("localhost", 9999)`. > > - To execute the processing plan you call > > `StreamExecutionEnvironment#execute()`. > > - `Graph`/`Flow` (Akka Streams), as a result of composing Sources (~ > > `KStreamBuilder.stream()`) and Sinks (~ `KStream#to()`) > > into Flows, which are [Runnable]Graphs. > > - You instantiate a Source directly, and then compose the Source with > > Sinks to create a RunnableGraph: > > see signature `Source#to[Mat2](sink: Graph[SinkShape[Out], Mat2]): > > RunnableGraph[Mat]`. > > - To execute the processing plan you call `Flow#run()`. > > > > In our DSL, in comparison, we do: > > > > - `KStreamBuilder` (Kafka Streams API) > > - To add a source you call e.g. `KStreamBuilder#stream("input- > topic")`. > > - To execute the processing plan you create a `KafkaStreams` instance > > from `KStreamBuilder` > > (where the builder will instantiate the topology = processing plan > to > > be executed), and then > > call `KafkaStreams#start()`. Think of `KafkaStreams` as our > runner. > > > > First, I agree with the sentiment that the current name of > `KStreamBuilder` > > isn't great (which is why we're having this discussion). Also, that > > finding a good name is tricky. ;-) > > > > Second, even though I agree with many of Jay's points I'm not sure > whether > > I like the `StreamsBuilder` suggestion (i.e. any name that does not > include > > "topology" or a similar term) that much more. It still doesn't describe > > what that class actually does, and what the difference to `KafkaStreams` > > is. IMHO, the point of `KStreamBuilder` is that it lets you build a > > logical plan (what we call "topology"), and `KafkaStreams` is the thing > > that executes that plan. I'm not yet convinced that abstracting these > two > > points away from the user is a good idea if the argument is that it's > > potentially confusing to beginners (a claim which I am not sure is > actually > > true). > > > > That said, if we rather favor "good-sounding but perhaps less technically > > correct names", I'd argue we should not even use something like > "Builder". > > We could, for example, also pick the following names: > > > > - KafkaStreams as the new name for the builder that creates the logical > > plan, with e.g. `KafkaStreams.stream("intput-topic")` and > > `KafkaStreams.table("input-topic")`. > > - KafkaStreamsRunner as the new name for the executioner of the plan, > with > > `KafkaStreamsRunner(KafkaStreams).run()`. > > > > > > > > On Tue, Mar 14, 2017 at 5:56 AM, Sriram Subramanian <r...@confluent.io> > > wrote: > > > >> StreamsBuilder would be my vote. > >> > >>> On Mar 13, 2017, at 9:42 PM, Jay Kreps <j...@confluent.io> wrote: > >>> > >>> Hey Matthias, > >>> > >>> Make sense, I'm more advocating for removing the word topology than any > >>> particular new replacement. > >>> > >>> -Jay > >>> > >>> On Mon, Mar 13, 2017 at 12:30 PM, Matthias J. Sax < > matth...@confluent.io > >>> > >>> wrote: > >>> > >>>> Jay, > >>>> > >>>> thanks for your feedback > >>>> > >>>>> What if instead we called it KStreamsBuilder? > >>>> > >>>> That's the current name and I personally think it's not the best one. > >>>> The main reason why I don't like KStreamsBuilder is, that we have the > >>>> concepts of KStreams and KTables, and the builder creates both. > However, > >>>> the name puts he focus on KStream and devalues KTable. > >>>> > >>>> I understand your argument, and I am personally open the remove the > >>>> "Topology" part, and name it "StreamsBuilder". Not sure what others > >>>> think about this. > >>>> > >>>> > >>>> About Processor API: I like the idea in general, but I thinks it's out > >>>> of scope for this KIP. KIP-120 has the focus on removing leaking > >>>> internal APIs and do some cleanup how our API reflects some concepts. > >>>> > >>>> However, I added your idea to API discussion Wiki page and we take if > >>>> from there: > >>>> https://cwiki.apache.org/confluence/display/KAFKA/ > >>>> Kafka+Streams+Discussions > >>>> > >>>> > >>>> > >>>> -Matthias > >>>> > >>>> > >>>>> On 3/13/17 11:52 AM, Jay Kreps wrote: > >>>>> Two things: > >>>>> > >>>>> 1. This is a minor thing but the proposed new name for > KStreamBuilder > >>>>> is StreamsTopologyBuilder. I actually think we should not put > >>>> topology in > >>>>> the name as topology is not a concept you need to understand at the > >>>>> kstreams layer right now. I'd think of three categories of > concepts: > >>>> (1) > >>>>> concepts you need to understand to get going even for a simple > >>>> example, (2) > >>>>> concepts you need to understand to operate and debug a real > >>>> production app, > >>>>> (3) concepts we truly abstract and you don't need to ever > understand. > >>>> I > >>>>> think in the kstream layer topologies are currently category (2), > and > >>>> this > >>>>> is where they belong. By introducing the name in even the simplest > >>>> example > >>>>> it means the user has to go read about toplogies to really > understand > >>>> even > >>>>> this simple snippet. What if instead we called it KStreamsBuilder? > >>>>> 2. For the processor api, I think this api is mostly not for end > >>>> users. > >>>>> However this are a couple cases where it might make sense to expose > >>>> it. I > >>>>> think users coming from Samza, or JMS's MessageListener ( > >>>>> https://docs.oracle.com/javaee/7/api/javax/jms/ > MessageListener.html) > >>>>> understand a simple callback interface for message processing. In > >>>> fact, > >>>>> people often ask why Kafka's consumer doesn't provide such an > >>>> interface. > >>>>> I'd argue we do, it's KafkaStreams. The only issue is that the > >>>> processor > >>>>> API documentation is a bit scary for a person implementing this > type > >>>> of > >>>>> api. My observation is that people using this style of API don't > do a > >>>> lot > >>>>> of cross-message operations, then just do single message operations > >>>> and use > >>>>> a database for anything that spans messages. They also don't factor > >>>> their > >>>>> code into many MessageListeners and compose them, they just have > one > >>>>> listener that has the complete handling logic. Say I am a user who > >>>> wants to > >>>>> implement a single Processor in this style. Do we have an easy way > to > >>>> do > >>>>> that today (either with the .transform/.process methods in kstreams > >>>> or with > >>>>> the topology apis)? Is there anything we can do in the way of > trivial > >>>>> helper code to make this better? Also, how can we explain that > >>>> pattern to > >>>>> people? I think currently we have pretty in-depth docs on our apis > >>>> but I > >>>>> suspect a person trying to figure out how to implement a simple > >>>> callback > >>>>> might get a bit lost trying to figure out how to wire it up. A > simple > >>>> five > >>>>> line example in the docs would probably help a lot. Not sure if > this > >>>> is > >>>>> best addressed in this KIP or is a side comment. > >>>>> > >>>>> Cheers, > >>>>> > >>>>> -Jay > >>>>> > >>>>> On Fri, Feb 3, 2017 at 3:33 PM, Matthias J. Sax < > matth...@confluent.io > >>> > >>>>> wrote: > >>>>> > >>>>>> Hi All, > >>>>>> > >>>>>> I did prepare a KIP to do some cleanup some of Kafka's Streaming > API. > >>>>>> > >>>>>> Please have a look here: > >>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP- > >>>>>> 120%3A+Cleanup+Kafka+Streams+builder+API > >>>>>> > >>>>>> Looking forward to your feedback! > >>>>>> > >>>>>> > >>>>>> -Matthias > >>>> > >>>> > >> > > > > -- -- Guozhang