Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API

Matthias J. Sax Mon, 27 Mar 2017 16:27:48 -0700

Hi,

I would like to trigger this discussion again. It seems that the naming
question is rather subjective and both main alternatives (w/ or w/o the
word "Topology" in the name) have pros/cons.


If you have any further thought, please share it. At the moment I still
propose `StreamsBuilder` in the KIP.

I also want do point out, that the VOTE thread was already started. So
if you like the current KIP, please cast your vote there.


Thanks a lot!


-Matthias


On 3/23/17 3:38 PM, Matthias J. Sax wrote:
> Jay,
> 
> about the naming schema:
> 
>>>    1. "kstreams" - the DSL
>>>    2. "processor api" - the lower level callback/topology api
>>>    3. KStream/KTable - entities in the kstreams dsl
>>>    4. "Kafka Streams" - General name for stream processing stuff in Kafka,
>>>    including both kstreams and the processor API plus the underlying
>>>    implementation.
> 
> It think this terminology has some issues... To me, `kstreams` was
> always not more than an abbreviation for `Kafka Streams` -- thus (1) and
> (4) kinda collide here. Following questions on the mailing list etc I
> often see people using kstreams or kstream exactly a abbr. for "Kafka
> Streams"
> 
>> I think referring to the dsl as "kstreams" is cute and pneumonic and not
>> particularly confusing.
> 
> I disagree here. It's a very subtle difference between `kstreams` and
> `KStream` -- just singular/plural, thus (1) and (3) also "collide" --
> it's just too close to each other.
> 
> Thus, I really think it's a good idea to get a new name for the DSL to
> get a better separation of the 4 concepts.
> 
> Furthermore, we use the term "Streams API". Thus, I think
> `StreamsBuilder` (or `StreamsTopologyBuilder`) are both very good names.
> 
> 
> Thus, I prefer to keep the KIP as is (suggesting `StreamsBuilder`).
> 
> I will start a VOTE thread. Of course, we can still discuss the naming
> issue. :)
> 
> 
> 
> -Matthias
> 
> 
> On 3/22/17 8:53 PM, Jay Kreps wrote:
>> I don't feel strongly on this, so I'm happy with whatever everyone else
>> wants.
>>
>> Michael, I'm not arguing that people don't need to understand topologies, I
>> just think it is like rocks db, you need to understand it when
>> debugging/operating but not in the initial coding since the metaphor we're
>> providing at this layer isn't a topology of processors but rather something
>> like the collections api. Anyhow it won't hurt people to have it there.
>>
>> For the original KStreamBuilder thing, I think that came from the naming we
>> discussed originally:
>>
>>    1. "kstreams" - the DSL
>>    2. "processor api" - the lower level callback/topology api
>>    3. KStream/KTable - entities in the kstreams dsl
>>    4. "Kafka Streams" - General name for stream processing stuff in Kafka,
>>    including both kstreams and the processor API plus the underlying
>>    implementation.
>>
>> I think referring to the dsl as "kstreams" is cute and pneumonic and not
>> particularly confusing. Just like referring to the "java collections
>> library" isn't confusing even though it contains the Iterator interface
>> which is not actually itself a collection.
>>
>> So I think KStreamBuilder should technically have been KstreamsBuilder and
>> is intended not to be a builder of a KStream but rather the builder for the
>> kstreams DSL. Okay, yes, that *is* slightly confusing. :-)
>>
>> -Jay
>>
>> On Wed, Mar 22, 2017 at 11:25 AM, Guozhang Wang <wangg...@gmail.com> wrote:
>>
>>> Regarding the naming of `StreamsTopologyBuilder` v.s. `StreamsBuilder` that
>>> are going to be used in DSL, I agree both has their arguments:
>>>
>>> 1. On one side, people using the DSL layer probably do not need to be aware
>>> (or rather, "learn about") of the "topology" concept, although this concept
>>> is a publicly exposed one in Kafka Streams.
>>>
>>> 2. On the other side, StreamsBuilder#build() returning a Topology object
>>> sounds a little weird, at least to me (admittedly subjective matter).
>>>
>>>
>>> Since the second bullet point seems to be more "subjective" and many people
>>> are not worried about it, I'm OK to go with the other option.
>>>
>>>
>>> Guozhang
>>>
>>>
>>> On Wed, Mar 22, 2017 at 8:58 AM, Michael Noll <mich...@confluent.io>
>>> wrote:
>>>
>>>> Forwarding to kafka-user.
>>>>
>>>>
>>>> ---------- Forwarded message ----------
>>>> From: Michael Noll <mich...@confluent.io>
>>>> Date: Wed, Mar 22, 2017 at 8:48 AM
>>>> Subject: Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API
>>>> To: d...@kafka.apache.org
>>>>
>>>>
>>>> Matthias,
>>>>
>>>>> @Michael:
>>>>>
>>>>> You seemed to agree with Jay about not exposing the `Topology` concept
>>>>> in our main entry class (ie, current KStreamBuilder), thus, I
>>>>> interpreted that you do not want `Topology` in the name either (I am a
>>>>> little surprised by your last response, that goes the opposite
>>>> direction).
>>>>
>>>> Oh, sorry for not being clear.
>>>>
>>>> What I wanted to say in my earlier email was the following:  Yes, I do
>>>> agree with most of Jay's reasoning, notably about carefully deciding how
>>>> much and which parts of the API/concept "surface" we expose to users of
>>> the
>>>> DSL.  However, and this is perhaps where I wasn't very clear, I disagree
>>> on
>>>> the particular opinion about not exposing the topology concept to DSL
>>>> users.  Instead, I think the concept of a topology is important to
>>>> understand even for DSL users -- particularly because of the way the DSL
>>> is
>>>> currently wiring your processing logic via the builder pattern.  (As I
>>>> noted, e.g. Akka uses a different approach where you might be able to get
>>>> away with not exposing the "topology" concept, but even in Akka there's
>>> the
>>>> notion of graphs and flows.)
>>>>
>>>>
>>>>>>     StreamsBuilder builder = new StreamsBuilder();
>>>>>>
>>>>>>     // And here you'd define your...well, what actually?
>>>>>>     // Ah right, you are composing a topology here, though you are
>>> not
>>>>>> aware of it.
>>>>>
>>>>> Yes. You are not aware of if -- that's the whole point about it --
>>> don't
>>>>> put the Topology concept in the focus...
>>>>
>>>> Let me turn this around, because that was my point: it's confusing to
>>> have
>>>> a name "StreamsBuilder" if that thing isn't building streams, and it is
>>>> not.
>>>>
>>>> As I mentioned before, I do think it is a benefit to make it clear to DSL
>>>> users that there are two aspects at play: (1) defining the logic/plan of
>>>> your processing, and (2) the execution of that plan.  I have a less
>>> strong
>>>> opinion whether or not having "topology" in the names would help to
>>>> communicate this separation as well as combination of (1) and (2) to make
>>>> your app work as expected.
>>>>
>>>> If we stick with `KafkaStreams` for (2) *and* don't like having
>>> "topology"
>>>> in the name, then perhaps we should rename `KStreamBuilder` to
>>>> `KafkaStreamsBuilder`.  That at least gives some illusion of a combo of
>>> (1)
>>>> and (2).  IMHO, `KafkaStreamsBuilder` highlights better that "it is a
>>>> builder/helper for the Kafka Streams API", rather than "a builder for
>>>> streams".
>>>>
>>>> Also, I think some of the naming challenges we're discussing here are
>>>> caused by having this builder pattern in the first place.  If the Streams
>>>> API was implemented in Scala, for example, we could use implicits for
>>>> helping us to "stitch streams/tables together to build the full
>>> topology",
>>>> thus using a different (better?) approach to composing your topologies
>>> that
>>>> through a builder pattern.  So: perhaps there's a better way then the
>>>> builder, and that way would also be clearer on terminology?  That said,
>>>> this might take this KIP off-scope.
>>>>
>>>> -Michael
>>>>
>>>>
>>>>
>>>>
>>>> On Wed, Mar 22, 2017 at 12:33 AM, Matthias J. Sax <matth...@confluent.io
>>>>
>>>> wrote:
>>>>
>>>>> @Guozhang:
>>>>>
>>>>> I recognized that you want to have `Topology` in the name. But it seems
>>>>> that more people preferred to not have it (Jay, Ram, Michael [?],
>>>> myself).
>>>>>
>>>>> @Michael:
>>>>>
>>>>> You seemed to agree with Jay about not exposing the `Topology` concept
>>>>> in our main entry class (ie, current KStreamBuilder), thus, I
>>>>> interpreted that you do not want `Topology` in the name either (I am a
>>>>> little surprised by your last response, that goes the opposite
>>>> direction).
>>>>>
>>>>>>     StreamsBuilder builder = new StreamsBuilder();
>>>>>>
>>>>>>     // And here you'd define your...well, what actually?
>>>>>>     // Ah right, you are composing a topology here, though you are
>>> not
>>>>>> aware of it.
>>>>>
>>>>> Yes. You are not aware of if -- that's the whole point about it --
>>> don't
>>>>> put the Topology concept in the focus...
>>>>>
>>>>> Furthermore,
>>>>>
>>>>>>>> So what are you building here with StreamsBuilder?  Streams (hint:
>>>> No)?
>>>>>>>> And what about tables -- is there a TableBuilder (hint: No)?
>>>>>
>>>>> I am not sure, if this is too much a concern. In contrast to
>>>>> `KStreamBuilder` (singular) that contains `KStream` and thus puts
>>>>> KStream concept in focus and thus degrade `KTable`, `StreamsBuilder`
>>>>> (plural) focuses on "Streams API". IMHO, it does not put focus on
>>>>> KStream. It's just a builder from the Streams API -- you don't need to
>>>>> worry what you are building -- and you don't need to think about the
>>>>> `Topology` concept (of course, you see that .build() return a
>>> Topology).
>>>>>
>>>>>
>>>>> Personally, I see pros and cons for both `StreamsBuilder` and
>>>>> `StreamsTopologyBuilder` and thus, I am fine either way. Maybe Jay and
>>>>> Ram can follow up and share their thoughts?
>>>>>
>>>>> I would also help a lot if other people put their vote for a name, too.
>>>>>
>>>>>
>>>>>
>>>>> -Matthias
>>>>>
>>>>>
>>>>>
>>>>> On 3/21/17 2:11 PM, Guozhang Wang wrote:
>>>>>> Just to clarify, I did want to have the term `Topology` as part of
>>> the
>>>>>> class name, for the reasons above. I'm not too worried about to be
>>>>>> consistent with the previous names, but I feel the
>>> `XXTopologyBuilder`
>>>> is
>>>>>> better than `XXStreamsBuilder` since it's build() function returns a
>>>>>> Topology object.
>>>>>>
>>>>>>
>>>>>> Guozhang
>>>>>>
>>>>>>
>>>>>> On Mon, Mar 20, 2017 at 12:53 PM, Michael Noll <mich...@confluent.io
>>>>
>>>>> wrote:
>>>>>>
>>>>>>> Hmm, I must admit I don't like this last update all too much.
>>>>>>>
>>>>>>> Basically we would have:
>>>>>>>
>>>>>>>     StreamsBuilder builder = new StreamsBuilder();
>>>>>>>
>>>>>>>     // And here you'd define your...well, what actually?
>>>>>>>     // Ah right, you are composing a topology here, though you are
>>> not
>>>>>>> aware of it.
>>>>>>>
>>>>>>>     KafkaStreams streams = new KafkaStreams(builder.build(),
>>>>>>> streamsConfiguration);
>>>>>>>
>>>>>>> So what are you building here with StreamsBuilder?  Streams (hint:
>>>> No)?
>>>>>>> And what about tables -- is there a TableBuilder (hint: No)?
>>>>>>>
>>>>>>> I also interpret Guozhang's last response as that he'd prefer to
>>> have
>>>>>>> "Topology" in the class/interface names.  I am aware that we
>>> shouldn't
>>>>>>> necessarily use the status quo to make decisions about future
>>> changes,
>>>>> but
>>>>>>> the very first concept we explain in the Kafka Streams documentation
>>>> is
>>>>>>> "Stream Processing Topology":
>>>>>>> https://kafka.apache.org/0102/documentation/streams#streams_
>>> concepts
>>>>>>>
>>>>>>> -Michael
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Mar 20, 2017 at 7:55 PM, Matthias J. Sax <
>>>> matth...@confluent.io
>>>>>>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> \cc users list
>>>>>>>>
>>>>>>>>
>>>>>>>> -------- Forwarded Message --------
>>>>>>>> Subject: Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API
>>>>>>>> Date: Mon, 20 Mar 2017 11:51:01 -0700
>>>>>>>> From: Matthias J. Sax <matth...@confluent.io>
>>>>>>>> Organization: Confluent Inc
>>>>>>>> To: d...@kafka.apache.org
>>>>>>>>
>>>>>>>> I want to push this discussion further.
>>>>>>>>
>>>>>>>> Guozhang's argument about "exposing" the Topology class is valid.
>>>> It's
>>>>> a
>>>>>>>> public class anyway, so it's not as issue. However, I think the
>>>>> question
>>>>>>>> is not too much about exposing but about "advertising" (ie, putting
>>>> it
>>>>>>>> into the focus) or not at DSL level.
>>>>>>>>
>>>>>>>>
>>>>>>>> If I interpret the last replies correctly, it seems that we could
>>>> agree
>>>>>>>> on "StreamsBuilder" as name. I did update the KIP accordingly.
>>> Please
>>>>>>>> correct me, if I got this wrong.
>>>>>>>>
>>>>>>>>
>>>>>>>> If there are not other objects -- this naming discussion was the
>>> last
>>>>>>>> open point to far -- I would like the start the VOTE thread.
>>>>>>>>
>>>>>>>>
>>>>>>>> -Matthias
>>>>>>>>
>>>>>>>>
>>>>>>>> On 3/14/17 2:37 PM, Guozhang Wang wrote:
>>>>>>>>> I'd like to keep the term "Topology" inside the builder class
>>> since,
>>>>> as
>>>>>>>>> Matthias mentioned, this builder#build() function returns a
>>>> "Topology"
>>>>>>>>> object, whose type is a public class anyways. Although you can
>>> argue
>>>>> to
>>>>>>>> let
>>>>>>>>> users always call
>>>>>>>>>
>>>>>>>>> "new KafkaStreams(builder.build())"
>>>>>>>>>
>>>>>>>>> I think it is still more benefit to expose this concept.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Guozhang
>>>>>>>>>
>>>>>>>>> On Tue, Mar 14, 2017 at 10:43 AM, Matthias J. Sax <
>>>>>>> matth...@confluent.io
>>>>>>>>>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Thanks for your input Michael.
>>>>>>>>>>
>>>>>>>>>>>> - KafkaStreams as the new name for the builder that creates the
>>>>>>>> logical
>>>>>>>>>>>> plan, with e.g. `KafkaStreams.stream("intput-topic")` and
>>>>>>>>>>>> `KafkaStreams.table("input-topic")`.
>>>>>>>>>>
>>>>>>>>>> I don't thinks this is a good idea, for multiple reasons:
>>>>>>>>>>
>>>>>>>>>> (1) We would reuse a name for a completely different purpose. The
>>>>> same
>>>>>>>>>> argument for not renaming KStreamBuilder to TopologyBuilder. The
>>>>>>>>>> confusion would just be too large.
>>>>>>>>>>
>>>>>>>>>> So if we would start from scratch, it might be ok to do so, but
>>> now
>>>>> we
>>>>>>>>>> cannot make this move, IMHO.
>>>>>>>>>>
>>>>>>>>>> Also a clarification question: do you suggest to have static
>>>> methods
>>>>>>>>>> #stream and #table -- I am not sure if this would work?
>>>>>>>>>> (or was you code snippet just simplification?)
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> (2) Kafka Streams is basically a "processing client" next to
>>>> consumer
>>>>>>>>>> and producer client. Thus, the name KafkaStreams aligns to the
>>>> naming
>>>>>>>>>> schema of KafkaConsumer and KafkaProducer. I am not sure if it
>>>> would
>>>>>>> be
>>>>>>>>>> a good choice to "break" this naming scheme.
>>>>>>>>>>
>>>>>>>>>> Btw: this is also the reason, why we have KafkaStreams#close() --
>>>> and
>>>>>>>>>> not KafkaStreams#stop() -- because #close() aligns with consumer
>>>> and
>>>>>>>>>> producer client.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> (3) On more argument against using KafkaStreams as DSL entry
>>> class
>>>>>>> would
>>>>>>>>>> be, that it would need to create a Topology that can be given to
>>>> the
>>>>>>>>>> "runner/processing-client". Thus the pattern would be
>>>>>>>>>>
>>>>>>>>>>> Topology topology = streams.build();
>>>>>>>>>>> KafkaStramsRunner runner = new KafkaStreamsRunner(..., topology)
>>>>>>>>>>
>>>>>>>>>> (or of course as a one liner).
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On the other hand, there was the idea (that we intentionally
>>>> excluded
>>>>>>>>>> from the KIP), to change the "client instantiation" pattern.
>>>>>>>>>>
>>>>>>>>>> Right now, a new client in actively instantiated (ie, by calling
>>>>>>> "new")
>>>>>>>>>> and the topology if provided as a constructor argument. However,
>>>>>>>>>> especially for DSL (not sure if it would make sense for PAPI),
>>> the
>>>>> DSL
>>>>>>>>>> builder could create the client for the user.
>>>>>>>>>>
>>>>>>>>>> Something like this:
>>>>>>>>>>
>>>>>>>>>>> KStreamBuilder builder = new KStreamBuilder();
>>>>>>>>>>> builder.whatever() // use the builder
>>>>>>>>>>>
>>>>>>>>>>> StreamsConfig config = ....
>>>>>>>>>>> KafkaStreams streams = builder.getKafkaStreams(config);
>>>>>>>>>>
>>>>>>>>>> If we change the patter like this, the notion a the "DSL builder"
>>>>>>> would
>>>>>>>>>> change, as it does not create a topology anymore, but it creates
>>>> the
>>>>>>>>>> "processing client". This would address Jay's concern about "not
>>>>>>>>>> exposing concept users don't need the understand" and would not
>>>>>>> require
>>>>>>>>>> to include the word "Topology" in the DSL builder class name,
>>>> because
>>>>>>>>>> the builder does not build a Topology anymore.
>>>>>>>>>>
>>>>>>>>>> I just put some names that came to my mind first hand -- did not
>>>>> think
>>>>>>>>>> about good names. It's just to discuss the pattern.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> -Matthias
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 3/14/17 3:36 AM, Michael Noll wrote:
>>>>>>>>>>> I see Jay's point, and I agree with much of it -- notably about
>>>>> being
>>>>>>>>>>> careful which concepts we do and do not expose, depending on
>>> which
>>>>>>> user
>>>>>>>>>>> group / user type is affected.  That said, I'm not sure yet
>>>> whether
>>>>>>> or
>>>>>>>>>> not
>>>>>>>>>>> we should get rid of "Topology" (or a similar term) in the DSL.
>>>>>>>>>>>
>>>>>>>>>>> For what it's worth, here's how related technologies define/name
>>>>>>> their
>>>>>>>>>>> "topologies" and "builders".  Note that, in all cases, it's
>>> about
>>>>>>>>>>> constructing a logical processing plan, which then is being
>>>>>>>> executed/run.
>>>>>>>>>>>
>>>>>>>>>>> - `Pipeline` (Google Dataflow/Apache Beam)
>>>>>>>>>>>     - To add a source you first instantiate the Source (e.g.
>>>>>>>>>>> `TextIO.Read.from("gs://some/inputData.txt")`),
>>>>>>>>>>>       then attach it to your processing plan via
>>>>>>>>>> `Pipeline#apply(<source>)`.
>>>>>>>>>>>       This setup is a bit different to our DSL because in our
>>> DSL
>>>>> the
>>>>>>>>>>> builder does both, i.e.
>>>>>>>>>>>       instantiating + auto-attaching to itself.
>>>>>>>>>>>     - To execute the processing plan you call
>>>> `Pipeline#execute()`.
>>>>>>>>>>> - `StreamingContext`` (Spark): This setup is similar to our DSL.
>>>>>>>>>>>     - To add a source you call e.g.
>>>>>>>>>>> `StreamingContext#socketTextStream("localhost", 9999)`.
>>>>>>>>>>>     - To execute the processing plan you call
>>>>>>>>>> `StreamingContext#execute()`.
>>>>>>>>>>> - `StreamExecutionEnvironment` (Flink): This setup is similar to
>>>> our
>>>>>>>> DSL.
>>>>>>>>>>>     - To add a source you call e.g.
>>>>>>>>>>> `StreamExecutionEnvironment#socketTextStream("localhost",
>>> 9999)`.
>>>>>>>>>>>     - To execute the processing plan you call
>>>>>>>>>>> `StreamExecutionEnvironment#execute()`.
>>>>>>>>>>> - `Graph`/`Flow` (Akka Streams), as a result of composing
>>> Sources
>>>> (~
>>>>>>>>>>> `KStreamBuilder.stream()`) and Sinks (~ `KStream#to()`)
>>>>>>>>>>>   into Flows, which are [Runnable]Graphs.
>>>>>>>>>>>     - You instantiate a Source directly, and then compose the
>>>> Source
>>>>>>>> with
>>>>>>>>>>> Sinks to create a RunnableGraph:
>>>>>>>>>>>       see signature `Source#to[Mat2](sink: Graph[SinkShape[Out],
>>>>>>>> Mat2]):
>>>>>>>>>>> RunnableGraph[Mat]`.
>>>>>>>>>>>     - To execute the processing plan you call `Flow#run()`.
>>>>>>>>>>>
>>>>>>>>>>> In our DSL, in comparison, we do:
>>>>>>>>>>>
>>>>>>>>>>> - `KStreamBuilder` (Kafka Streams API)
>>>>>>>>>>>     - To add a source you call e.g.
>>> `KStreamBuilder#stream("input-
>>>>>>>>>> topic")`.
>>>>>>>>>>>     - To execute the processing plan you create a `KafkaStreams`
>>>>>>>> instance
>>>>>>>>>>> from `KStreamBuilder`
>>>>>>>>>>>       (where the builder will instantiate the topology =
>>>> processing
>>>>>>>> plan
>>>>>>>>>> to
>>>>>>>>>>> be executed), and then
>>>>>>>>>>>       call `KafkaStreams#start()`.  Think of `KafkaStreams` as
>>> our
>>>>>>>>>> runner.
>>>>>>>>>>>
>>>>>>>>>>> First, I agree with the sentiment that the current name of
>>>>>>>>>> `KStreamBuilder`
>>>>>>>>>>> isn't great (which is why we're having this discussion).  Also,
>>>> that
>>>>>>>>>>> finding a good name is tricky. ;-)
>>>>>>>>>>>
>>>>>>>>>>> Second, even though I agree with many of Jay's points I'm not
>>> sure
>>>>>>>>>> whether
>>>>>>>>>>> I like the `StreamsBuilder` suggestion (i.e. any name that does
>>>> not
>>>>>>>>>> include
>>>>>>>>>>> "topology" or a similar term) that much more.  It still doesn't
>>>>>>>> describe
>>>>>>>>>>> what that class actually does, and what the difference to
>>>>>>>> `KafkaStreams`
>>>>>>>>>>> is.  IMHO, the point of `KStreamBuilder` is that it lets you
>>>> build a
>>>>>>>>>>> logical plan (what we call "topology"), and `KafkaStreams` is
>>> the
>>>>>>> thing
>>>>>>>>>>> that executes that plan.  I'm not yet convinced that abstracting
>>>>>>> these
>>>>>>>>>> two
>>>>>>>>>>> points away from the user is a good idea if the argument is that
>>>>> it's
>>>>>>>>>>> potentially confusing to beginners (a claim which I am not sure
>>> is
>>>>>>>>>> actually
>>>>>>>>>>> true).
>>>>>>>>>>>
>>>>>>>>>>> That said, if we rather favor "good-sounding but perhaps less
>>>>>>>> technically
>>>>>>>>>>> correct names", I'd argue we should not even use something like
>>>>>>>>>> "Builder".
>>>>>>>>>>> We could, for example, also pick the following names:
>>>>>>>>>>>
>>>>>>>>>>> - KafkaStreams as the new name for the builder that creates the
>>>>>>> logical
>>>>>>>>>>> plan, with e.g. `KafkaStreams.stream("intput-topic")` and
>>>>>>>>>>> `KafkaStreams.table("input-topic")`.
>>>>>>>>>>> - KafkaStreamsRunner as the new name for the executioner of the
>>>>> plan,
>>>>>>>>>> with
>>>>>>>>>>> `KafkaStreamsRunner(KafkaStreams).run()`.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Mar 14, 2017 at 5:56 AM, Sriram Subramanian <
>>>>>>> r...@confluent.io>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> StreamsBuilder would be my vote.
>>>>>>>>>>>>
>>>>>>>>>>>>> On Mar 13, 2017, at 9:42 PM, Jay Kreps <j...@confluent.io>
>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hey Matthias,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Make sense, I'm more advocating for removing the word topology
>>>>> than
>>>>>>>> any
>>>>>>>>>>>>> particular new replacement.
>>>>>>>>>>>>>
>>>>>>>>>>>>> -Jay
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Mon, Mar 13, 2017 at 12:30 PM, Matthias J. Sax <
>>>>>>>>>> matth...@confluent.io
>>>>>>>>>>>>>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Jay,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> thanks for your feedback
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> What if instead we called it KStreamsBuilder?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> That's the current name and I personally think it's not the
>>>> best
>>>>>>>> one.
>>>>>>>>>>>>>> The main reason why I don't like KStreamsBuilder is, that we
>>>> have
>>>>>>>> the
>>>>>>>>>>>>>> concepts of KStreams and KTables, and the builder creates
>>> both.
>>>>>>>>>> However,
>>>>>>>>>>>>>> the name puts he focus on KStream and devalues KTable.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I understand your argument, and I am personally open the
>>> remove
>>>>>>> the
>>>>>>>>>>>>>> "Topology" part, and name it "StreamsBuilder". Not sure what
>>>>>>> others
>>>>>>>>>>>>>> think about this.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> About Processor API: I like the idea in general, but I thinks
>>>>> it's
>>>>>>>> out
>>>>>>>>>>>>>> of scope for this KIP. KIP-120 has the focus on removing
>>>> leaking
>>>>>>>>>>>>>> internal APIs and do some cleanup how our API reflects some
>>>>>>>> concepts.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> However, I added your idea to API discussion Wiki page and we
>>>>> take
>>>>>>>> if
>>>>>>>>>>>>>> from there:
>>>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/
>>>>>>>>>>>>>> Kafka+Streams+Discussions
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 3/13/17 11:52 AM, Jay Kreps wrote:
>>>>>>>>>>>>>>> Two things:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>   1. This is a minor thing but the proposed new name for
>>>>>>>>>> KStreamBuilder
>>>>>>>>>>>>>>>   is StreamsTopologyBuilder. I actually think we should not
>>>> put
>>>>>>>>>>>>>> topology in
>>>>>>>>>>>>>>>   the name as topology is not a concept you need to
>>> understand
>>>>> at
>>>>>>>> the
>>>>>>>>>>>>>>>   kstreams layer right now. I'd think of three categories of
>>>>>>>>>> concepts:
>>>>>>>>>>>>>> (1)
>>>>>>>>>>>>>>>   concepts you need to understand to get going even for a
>>>> simple
>>>>>>>>>>>>>> example, (2)
>>>>>>>>>>>>>>>   concepts you need to understand to operate and debug a
>>> real
>>>>>>>>>>>>>> production app,
>>>>>>>>>>>>>>>   (3) concepts we truly abstract and you don't need to ever
>>>>>>>>>> understand.
>>>>>>>>>>>>>> I
>>>>>>>>>>>>>>>   think in the kstream layer topologies are currently
>>> category
>>>>>>> (2),
>>>>>>>>>> and
>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>   is where they belong. By introducing the name in even the
>>>>>>>> simplest
>>>>>>>>>>>>>> example
>>>>>>>>>>>>>>>   it means the user has to go read about toplogies to really
>>>>>>>>>> understand
>>>>>>>>>>>>>> even
>>>>>>>>>>>>>>>   this simple snippet. What if instead we called it
>>>>>>>> KStreamsBuilder?
>>>>>>>>>>>>>>>   2. For the processor api, I think this api is mostly not
>>> for
>>>>>>> end
>>>>>>>>>>>>>> users.
>>>>>>>>>>>>>>>   However this are a couple cases where it might make sense
>>> to
>>>>>>>> expose
>>>>>>>>>>>>>> it. I
>>>>>>>>>>>>>>>   think users coming from Samza, or JMS's MessageListener (
>>>>>>>>>>>>>>>   https://docs.oracle.com/javaee/7/api/javax/jms/
>>>>>>>>>> MessageListener.html)
>>>>>>>>>>>>>>>   understand a simple callback interface for message
>>>> processing.
>>>>>>> In
>>>>>>>>>>>>>> fact,
>>>>>>>>>>>>>>>   people often ask why Kafka's consumer doesn't provide such
>>>> an
>>>>>>>>>>>>>> interface.
>>>>>>>>>>>>>>>   I'd argue we do, it's KafkaStreams. The only issue is that
>>>> the
>>>>>>>>>>>>>> processor
>>>>>>>>>>>>>>>   API documentation is a bit scary for a person implementing
>>>>> this
>>>>>>>>>> type
>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>   api. My observation is that people using this style of API
>>>>>>> don't
>>>>>>>>>> do a
>>>>>>>>>>>>>> lot
>>>>>>>>>>>>>>>   of cross-message operations, then just do single message
>>>>>>>> operations
>>>>>>>>>>>>>> and use
>>>>>>>>>>>>>>>   a database for anything that spans messages. They also
>>> don't
>>>>>>>> factor
>>>>>>>>>>>>>> their
>>>>>>>>>>>>>>>   code into many MessageListeners and compose them, they
>>> just
>>>>>>> have
>>>>>>>>>> one
>>>>>>>>>>>>>>>   listener that has the complete handling logic. Say I am a
>>>> user
>>>>>>>> who
>>>>>>>>>>>>>> wants to
>>>>>>>>>>>>>>>   implement a single Processor in this style. Do we have an
>>>> easy
>>>>>>>> way
>>>>>>>>>> to
>>>>>>>>>>>>>> do
>>>>>>>>>>>>>>>   that today (either with the .transform/.process methods in
>>>>>>>> kstreams
>>>>>>>>>>>>>> or with
>>>>>>>>>>>>>>>   the topology apis)? Is there anything we can do in the way
>>>> of
>>>>>>>>>> trivial
>>>>>>>>>>>>>>>   helper code to make this better? Also, how can we explain
>>>> that
>>>>>>>>>>>>>> pattern to
>>>>>>>>>>>>>>>   people? I think currently we have pretty in-depth docs on
>>>> our
>>>>>>>> apis
>>>>>>>>>>>>>> but I
>>>>>>>>>>>>>>>   suspect a person trying to figure out how to implement a
>>>>> simple
>>>>>>>>>>>>>> callback
>>>>>>>>>>>>>>>   might get a bit lost trying to figure out how to wire it
>>>> up. A
>>>>>>>>>> simple
>>>>>>>>>>>>>> five
>>>>>>>>>>>>>>>   line example in the docs would probably help a lot. Not
>>> sure
>>>>> if
>>>>>>>>>> this
>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>   best addressed in this KIP or is a side comment.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -Jay
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Fri, Feb 3, 2017 at 3:33 PM, Matthias J. Sax <
>>>>>>>>>> matth...@confluent.io
>>>>>>>>>>>>>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I did prepare a KIP to do some cleanup some of Kafka's
>>>>> Streaming
>>>>>>>>>> API.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Please have a look here:
>>>>>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>>>>>>>>>>>>>>>> 120%3A+Cleanup+Kafka+Streams+builder+API
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Looking forward to your feedback!
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> -- Guozhang
>>>
>>
>

signature.asc
Description: OpenPGP digital signature

Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API

Reply via email to