Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API

Guozhang Wang Tue, 14 Mar 2017 14:38:38 -0700

I'd like to keep the term "Topology" inside the builder class since, as
Matthias mentioned, this builder#build() function returns a "Topology"
object, whose type is a public class anyways. Although you can argue to let
users always call


"new KafkaStreams(builder.build())"

I think it is still more benefit to expose this concept.



Guozhang

On Tue, Mar 14, 2017 at 10:43 AM, Matthias J. Sax <matth...@confluent.io>
wrote:

> Thanks for your input Michael.
>
> >> - KafkaStreams as the new name for the builder that creates the logical
> >> plan, with e.g. `KafkaStreams.stream("intput-topic")` and
> >> `KafkaStreams.table("input-topic")`.
>
> I don't thinks this is a good idea, for multiple reasons:
>
> (1) We would reuse a name for a completely different purpose. The same
> argument for not renaming KStreamBuilder to TopologyBuilder. The
> confusion would just be too large.
>
> So if we would start from scratch, it might be ok to do so, but now we
> cannot make this move, IMHO.
>
> Also a clarification question: do you suggest to have static methods
> #stream and #table -- I am not sure if this would work?
> (or was you code snippet just simplification?)
>
>
> (2) Kafka Streams is basically a "processing client" next to consumer
> and producer client. Thus, the name KafkaStreams aligns to the naming
> schema of KafkaConsumer and KafkaProducer. I am not sure if it would be
> a good choice to "break" this naming scheme.
>
> Btw: this is also the reason, why we have KafkaStreams#close() -- and
> not KafkaStreams#stop() -- because #close() aligns with consumer and
> producer client.
>
>
> (3) On more argument against using KafkaStreams as DSL entry class would
> be, that it would need to create a Topology that can be given to the
> "runner/processing-client". Thus the pattern would be
>
> > Topology topology = streams.build();
> > KafkaStramsRunner runner = new KafkaStreamsRunner(..., topology)
>
> (or of course as a one liner).
>
>
>
> On the other hand, there was the idea (that we intentionally excluded
> from the KIP), to change the "client instantiation" pattern.
>
> Right now, a new client in actively instantiated (ie, by calling "new")
> and the topology if provided as a constructor argument. However,
> especially for DSL (not sure if it would make sense for PAPI), the DSL
> builder could create the client for the user.
>
> Something like this:
>
> > KStreamBuilder builder = new KStreamBuilder();
> > builder.whatever() // use the builder
> >
> > StreamsConfig config = ....
> > KafkaStreams streams = builder.getKafkaStreams(config);
>
> If we change the patter like this, the notion a the "DSL builder" would
> change, as it does not create a topology anymore, but it creates the
> "processing client". This would address Jay's concern about "not
> exposing concept users don't need the understand" and would not require
> to include the word "Topology" in the DSL builder class name, because
> the builder does not build a Topology anymore.
>
> I just put some names that came to my mind first hand -- did not think
> about good names. It's just to discuss the pattern.
>
>
>
> -Matthias
>
>
>
>
>
> On 3/14/17 3:36 AM, Michael Noll wrote:
> > I see Jay's point, and I agree with much of it -- notably about being
> > careful which concepts we do and do not expose, depending on which user
> > group / user type is affected.  That said, I'm not sure yet whether or
> not
> > we should get rid of "Topology" (or a similar term) in the DSL.
> >
> > For what it's worth, here's how related technologies define/name their
> > "topologies" and "builders".  Note that, in all cases, it's about
> > constructing a logical processing plan, which then is being executed/run.
> >
> > - `Pipeline` (Google Dataflow/Apache Beam)
> >     - To add a source you first instantiate the Source (e.g.
> > `TextIO.Read.from("gs://some/inputData.txt")`),
> >       then attach it to your processing plan via
> `Pipeline#apply(<source>)`.
> >       This setup is a bit different to our DSL because in our DSL the
> > builder does both, i.e.
> >       instantiating + auto-attaching to itself.
> >     - To execute the processing plan you call `Pipeline#execute()`.
> > - `StreamingContext`` (Spark): This setup is similar to our DSL.
> >     - To add a source you call e.g.
> > `StreamingContext#socketTextStream("localhost", 9999)`.
> >     - To execute the processing plan you call
> `StreamingContext#execute()`.
> > - `StreamExecutionEnvironment` (Flink): This setup is similar to our DSL.
> >     - To add a source you call e.g.
> > `StreamExecutionEnvironment#socketTextStream("localhost", 9999)`.
> >     - To execute the processing plan you call
> > `StreamExecutionEnvironment#execute()`.
> > - `Graph`/`Flow` (Akka Streams), as a result of composing Sources (~
> > `KStreamBuilder.stream()`) and Sinks (~ `KStream#to()`)
> >   into Flows, which are [Runnable]Graphs.
> >     - You instantiate a Source directly, and then compose the Source with
> > Sinks to create a RunnableGraph:
> >       see signature `Source#to[Mat2](sink: Graph[SinkShape[Out], Mat2]):
> > RunnableGraph[Mat]`.
> >     - To execute the processing plan you call `Flow#run()`.
> >
> > In our DSL, in comparison, we do:
> >
> > - `KStreamBuilder` (Kafka Streams API)
> >     - To add a source you call e.g. `KStreamBuilder#stream("input-
> topic")`.
> >     - To execute the processing plan you create a `KafkaStreams` instance
> > from `KStreamBuilder`
> >       (where the builder will instantiate the topology = processing plan
> to
> > be executed), and then
> >       call `KafkaStreams#start()`.  Think of `KafkaStreams` as our
> runner.
> >
> > First, I agree with the sentiment that the current name of
> `KStreamBuilder`
> > isn't great (which is why we're having this discussion).  Also, that
> > finding a good name is tricky. ;-)
> >
> > Second, even though I agree with many of Jay's points I'm not sure
> whether
> > I like the `StreamsBuilder` suggestion (i.e. any name that does not
> include
> > "topology" or a similar term) that much more.  It still doesn't describe
> > what that class actually does, and what the difference to `KafkaStreams`
> > is.  IMHO, the point of `KStreamBuilder` is that it lets you build a
> > logical plan (what we call "topology"), and `KafkaStreams` is the thing
> > that executes that plan.  I'm not yet convinced that abstracting these
> two
> > points away from the user is a good idea if the argument is that it's
> > potentially confusing to beginners (a claim which I am not sure is
> actually
> > true).
> >
> > That said, if we rather favor "good-sounding but perhaps less technically
> > correct names", I'd argue we should not even use something like
> "Builder".
> > We could, for example, also pick the following names:
> >
> > - KafkaStreams as the new name for the builder that creates the logical
> > plan, with e.g. `KafkaStreams.stream("intput-topic")` and
> > `KafkaStreams.table("input-topic")`.
> > - KafkaStreamsRunner as the new name for the executioner of the plan,
> with
> > `KafkaStreamsRunner(KafkaStreams).run()`.
> >
> >
> >
> > On Tue, Mar 14, 2017 at 5:56 AM, Sriram Subramanian <r...@confluent.io>
> > wrote:
> >
> >> StreamsBuilder would be my vote.
> >>
> >>> On Mar 13, 2017, at 9:42 PM, Jay Kreps <j...@confluent.io> wrote:
> >>>
> >>> Hey Matthias,
> >>>
> >>> Make sense, I'm more advocating for removing the word topology than any
> >>> particular new replacement.
> >>>
> >>> -Jay
> >>>
> >>> On Mon, Mar 13, 2017 at 12:30 PM, Matthias J. Sax <
> matth...@confluent.io
> >>>
> >>> wrote:
> >>>
> >>>> Jay,
> >>>>
> >>>> thanks for your feedback
> >>>>
> >>>>> What if instead we called it KStreamsBuilder?
> >>>>
> >>>> That's the current name and I personally think it's not the best one.
> >>>> The main reason why I don't like KStreamsBuilder is, that we have the
> >>>> concepts of KStreams and KTables, and the builder creates both.
> However,
> >>>> the name puts he focus on KStream and devalues KTable.
> >>>>
> >>>> I understand your argument, and I am personally open the remove the
> >>>> "Topology" part, and name it "StreamsBuilder". Not sure what others
> >>>> think about this.
> >>>>
> >>>>
> >>>> About Processor API: I like the idea in general, but I thinks it's out
> >>>> of scope for this KIP. KIP-120 has the focus on removing leaking
> >>>> internal APIs and do some cleanup how our API reflects some concepts.
> >>>>
> >>>> However, I added your idea to API discussion Wiki page and we take if
> >>>> from there:
> >>>> https://cwiki.apache.org/confluence/display/KAFKA/
> >>>> Kafka+Streams+Discussions
> >>>>
> >>>>
> >>>>
> >>>> -Matthias
> >>>>
> >>>>
> >>>>> On 3/13/17 11:52 AM, Jay Kreps wrote:
> >>>>> Two things:
> >>>>>
> >>>>>   1. This is a minor thing but the proposed new name for
> KStreamBuilder
> >>>>>   is StreamsTopologyBuilder. I actually think we should not put
> >>>> topology in
> >>>>>   the name as topology is not a concept you need to understand at the
> >>>>>   kstreams layer right now. I'd think of three categories of
> concepts:
> >>>> (1)
> >>>>>   concepts you need to understand to get going even for a simple
> >>>> example, (2)
> >>>>>   concepts you need to understand to operate and debug a real
> >>>> production app,
> >>>>>   (3) concepts we truly abstract and you don't need to ever
> understand.
> >>>> I
> >>>>>   think in the kstream layer topologies are currently category (2),
> and
> >>>> this
> >>>>>   is where they belong. By introducing the name in even the simplest
> >>>> example
> >>>>>   it means the user has to go read about toplogies to really
> understand
> >>>> even
> >>>>>   this simple snippet. What if instead we called it KStreamsBuilder?
> >>>>>   2. For the processor api, I think this api is mostly not for end
> >>>> users.
> >>>>>   However this are a couple cases where it might make sense to expose
> >>>> it. I
> >>>>>   think users coming from Samza, or JMS's MessageListener (
> >>>>>   https://docs.oracle.com/javaee/7/api/javax/jms/
> MessageListener.html)
> >>>>>   understand a simple callback interface for message processing. In
> >>>> fact,
> >>>>>   people often ask why Kafka's consumer doesn't provide such an
> >>>> interface.
> >>>>>   I'd argue we do, it's KafkaStreams. The only issue is that the
> >>>> processor
> >>>>>   API documentation is a bit scary for a person implementing this
> type
> >>>> of
> >>>>>   api. My observation is that people using this style of API don't
> do a
> >>>> lot
> >>>>>   of cross-message operations, then just do single message operations
> >>>> and use
> >>>>>   a database for anything that spans messages. They also don't factor
> >>>> their
> >>>>>   code into many MessageListeners and compose them, they just have
> one
> >>>>>   listener that has the complete handling logic. Say I am a user who
> >>>> wants to
> >>>>>   implement a single Processor in this style. Do we have an easy way
> to
> >>>> do
> >>>>>   that today (either with the .transform/.process methods in kstreams
> >>>> or with
> >>>>>   the topology apis)? Is there anything we can do in the way of
> trivial
> >>>>>   helper code to make this better? Also, how can we explain that
> >>>> pattern to
> >>>>>   people? I think currently we have pretty in-depth docs on our apis
> >>>> but I
> >>>>>   suspect a person trying to figure out how to implement a simple
> >>>> callback
> >>>>>   might get a bit lost trying to figure out how to wire it up. A
> simple
> >>>> five
> >>>>>   line example in the docs would probably help a lot. Not sure if
> this
> >>>> is
> >>>>>   best addressed in this KIP or is a side comment.
> >>>>>
> >>>>> Cheers,
> >>>>>
> >>>>> -Jay
> >>>>>
> >>>>> On Fri, Feb 3, 2017 at 3:33 PM, Matthias J. Sax <
> matth...@confluent.io
> >>>
> >>>>> wrote:
> >>>>>
> >>>>>> Hi All,
> >>>>>>
> >>>>>> I did prepare a KIP to do some cleanup some of Kafka's Streaming
> API.
> >>>>>>
> >>>>>> Please have a look here:
> >>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> >>>>>> 120%3A+Cleanup+Kafka+Streams+builder+API
> >>>>>>
> >>>>>> Looking forward to your feedback!
> >>>>>>
> >>>>>>
> >>>>>> -Matthias
> >>>>
> >>>>
> >>
> >
>
>


-- 
-- Guozhang

Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API

Reply via email to