I see Jay's point, and I agree with much of it -- notably about being careful which concepts we do and do not expose, depending on which user group / user type is affected. That said, I'm not sure yet whether or not we should get rid of "Topology" (or a similar term) in the DSL.
For what it's worth, here's how related technologies define/name their "topologies" and "builders". Note that, in all cases, it's about constructing a logical processing plan, which then is being executed/run. - `Pipeline` (Google Dataflow/Apache Beam) - To add a source you first instantiate the Source (e.g. `TextIO.Read.from("gs://some/inputData.txt")`), then attach it to your processing plan via `Pipeline#apply(<source>)`. This setup is a bit different to our DSL because in our DSL the builder does both, i.e. instantiating + auto-attaching to itself. - To execute the processing plan you call `Pipeline#execute()`. - `StreamingContext`` (Spark): This setup is similar to our DSL. - To add a source you call e.g. `StreamingContext#socketTextStream("localhost", 9999)`. - To execute the processing plan you call `StreamingContext#execute()`. - `StreamExecutionEnvironment` (Flink): This setup is similar to our DSL. - To add a source you call e.g. `StreamExecutionEnvironment#socketTextStream("localhost", 9999)`. - To execute the processing plan you call `StreamExecutionEnvironment#execute()`. - `Graph`/`Flow` (Akka Streams), as a result of composing Sources (~ `KStreamBuilder.stream()`) and Sinks (~ `KStream#to()`) into Flows, which are [Runnable]Graphs. - You instantiate a Source directly, and then compose the Source with Sinks to create a RunnableGraph: see signature `Source#to[Mat2](sink: Graph[SinkShape[Out], Mat2]): RunnableGraph[Mat]`. - To execute the processing plan you call `Flow#run()`. In our DSL, in comparison, we do: - `KStreamBuilder` (Kafka Streams API) - To add a source you call e.g. `KStreamBuilder#stream("input-topic")`. - To execute the processing plan you create a `KafkaStreams` instance from `KStreamBuilder` (where the builder will instantiate the topology = processing plan to be executed), and then call `KafkaStreams#start()`. Think of `KafkaStreams` as our runner. First, I agree with the sentiment that the current name of `KStreamBuilder` isn't great (which is why we're having this discussion). Also, that finding a good name is tricky. ;-) Second, even though I agree with many of Jay's points I'm not sure whether I like the `StreamsBuilder` suggestion (i.e. any name that does not include "topology" or a similar term) that much more. It still doesn't describe what that class actually does, and what the difference to `KafkaStreams` is. IMHO, the point of `KStreamBuilder` is that it lets you build a logical plan (what we call "topology"), and `KafkaStreams` is the thing that executes that plan. I'm not yet convinced that abstracting these two points away from the user is a good idea if the argument is that it's potentially confusing to beginners (a claim which I am not sure is actually true). That said, if we rather favor "good-sounding but perhaps less technically correct names", I'd argue we should not even use something like "Builder". We could, for example, also pick the following names: - KafkaStreams as the new name for the builder that creates the logical plan, with e.g. `KafkaStreams.stream("intput-topic")` and `KafkaStreams.table("input-topic")`. - KafkaStreamsRunner as the new name for the executioner of the plan, with `KafkaStreamsRunner(KafkaStreams).run()`. On Tue, Mar 14, 2017 at 5:56 AM, Sriram Subramanian <r...@confluent.io> wrote: > StreamsBuilder would be my vote. > > > On Mar 13, 2017, at 9:42 PM, Jay Kreps <j...@confluent.io> wrote: > > > > Hey Matthias, > > > > Make sense, I'm more advocating for removing the word topology than any > > particular new replacement. > > > > -Jay > > > > On Mon, Mar 13, 2017 at 12:30 PM, Matthias J. Sax <matth...@confluent.io > > > > wrote: > > > >> Jay, > >> > >> thanks for your feedback > >> > >>> What if instead we called it KStreamsBuilder? > >> > >> That's the current name and I personally think it's not the best one. > >> The main reason why I don't like KStreamsBuilder is, that we have the > >> concepts of KStreams and KTables, and the builder creates both. However, > >> the name puts he focus on KStream and devalues KTable. > >> > >> I understand your argument, and I am personally open the remove the > >> "Topology" part, and name it "StreamsBuilder". Not sure what others > >> think about this. > >> > >> > >> About Processor API: I like the idea in general, but I thinks it's out > >> of scope for this KIP. KIP-120 has the focus on removing leaking > >> internal APIs and do some cleanup how our API reflects some concepts. > >> > >> However, I added your idea to API discussion Wiki page and we take if > >> from there: > >> https://cwiki.apache.org/confluence/display/KAFKA/ > >> Kafka+Streams+Discussions > >> > >> > >> > >> -Matthias > >> > >> > >>> On 3/13/17 11:52 AM, Jay Kreps wrote: > >>> Two things: > >>> > >>> 1. This is a minor thing but the proposed new name for KStreamBuilder > >>> is StreamsTopologyBuilder. I actually think we should not put > >> topology in > >>> the name as topology is not a concept you need to understand at the > >>> kstreams layer right now. I'd think of three categories of concepts: > >> (1) > >>> concepts you need to understand to get going even for a simple > >> example, (2) > >>> concepts you need to understand to operate and debug a real > >> production app, > >>> (3) concepts we truly abstract and you don't need to ever understand. > >> I > >>> think in the kstream layer topologies are currently category (2), and > >> this > >>> is where they belong. By introducing the name in even the simplest > >> example > >>> it means the user has to go read about toplogies to really understand > >> even > >>> this simple snippet. What if instead we called it KStreamsBuilder? > >>> 2. For the processor api, I think this api is mostly not for end > >> users. > >>> However this are a couple cases where it might make sense to expose > >> it. I > >>> think users coming from Samza, or JMS's MessageListener ( > >>> https://docs.oracle.com/javaee/7/api/javax/jms/MessageListener.html) > >>> understand a simple callback interface for message processing. In > >> fact, > >>> people often ask why Kafka's consumer doesn't provide such an > >> interface. > >>> I'd argue we do, it's KafkaStreams. The only issue is that the > >> processor > >>> API documentation is a bit scary for a person implementing this type > >> of > >>> api. My observation is that people using this style of API don't do a > >> lot > >>> of cross-message operations, then just do single message operations > >> and use > >>> a database for anything that spans messages. They also don't factor > >> their > >>> code into many MessageListeners and compose them, they just have one > >>> listener that has the complete handling logic. Say I am a user who > >> wants to > >>> implement a single Processor in this style. Do we have an easy way to > >> do > >>> that today (either with the .transform/.process methods in kstreams > >> or with > >>> the topology apis)? Is there anything we can do in the way of trivial > >>> helper code to make this better? Also, how can we explain that > >> pattern to > >>> people? I think currently we have pretty in-depth docs on our apis > >> but I > >>> suspect a person trying to figure out how to implement a simple > >> callback > >>> might get a bit lost trying to figure out how to wire it up. A simple > >> five > >>> line example in the docs would probably help a lot. Not sure if this > >> is > >>> best addressed in this KIP or is a side comment. > >>> > >>> Cheers, > >>> > >>> -Jay > >>> > >>> On Fri, Feb 3, 2017 at 3:33 PM, Matthias J. Sax <matth...@confluent.io > > > >>> wrote: > >>> > >>>> Hi All, > >>>> > >>>> I did prepare a KIP to do some cleanup some of Kafka's Streaming API. > >>>> > >>>> Please have a look here: > >>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP- > >>>> 120%3A+Cleanup+Kafka+Streams+builder+API > >>>> > >>>> Looking forward to your feedback! > >>>> > >>>> > >>>> -Matthias > >> > >> >