Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API

Michael Noll Tue, 14 Mar 2017 03:44:37 -0700

I see Jay's point, and I agree with much of it -- notably about being
careful which concepts we do and do not expose, depending on which user
group / user type is affected.  That said, I'm not sure yet whether or not
we should get rid of "Topology" (or a similar term) in the DSL.


For what it's worth, here's how related technologies define/name their
"topologies" and "builders".  Note that, in all cases, it's about
constructing a logical processing plan, which then is being executed/run.

- `Pipeline` (Google Dataflow/Apache Beam)
    - To add a source you first instantiate the Source (e.g.
`TextIO.Read.from("gs://some/inputData.txt")`),
      then attach it to your processing plan via `Pipeline#apply(<source>)`.
      This setup is a bit different to our DSL because in our DSL the
builder does both, i.e.
      instantiating + auto-attaching to itself.
    - To execute the processing plan you call `Pipeline#execute()`.
- `StreamingContext`` (Spark): This setup is similar to our DSL.
    - To add a source you call e.g.
`StreamingContext#socketTextStream("localhost", 9999)`.
    - To execute the processing plan you call `StreamingContext#execute()`.
- `StreamExecutionEnvironment` (Flink): This setup is similar to our DSL.
    - To add a source you call e.g.
`StreamExecutionEnvironment#socketTextStream("localhost", 9999)`.
    - To execute the processing plan you call
`StreamExecutionEnvironment#execute()`.
- `Graph`/`Flow` (Akka Streams), as a result of composing Sources (~
`KStreamBuilder.stream()`) and Sinks (~ `KStream#to()`)
  into Flows, which are [Runnable]Graphs.
    - You instantiate a Source directly, and then compose the Source with
Sinks to create a RunnableGraph:
      see signature `Source#to[Mat2](sink: Graph[SinkShape[Out], Mat2]):
RunnableGraph[Mat]`.
    - To execute the processing plan you call `Flow#run()`.

In our DSL, in comparison, we do:

- `KStreamBuilder` (Kafka Streams API)
    - To add a source you call e.g. `KStreamBuilder#stream("input-topic")`.
    - To execute the processing plan you create a `KafkaStreams` instance
from `KStreamBuilder`
      (where the builder will instantiate the topology = processing plan to
be executed), and then
      call `KafkaStreams#start()`.  Think of `KafkaStreams` as our runner.

First, I agree with the sentiment that the current name of `KStreamBuilder`
isn't great (which is why we're having this discussion).  Also, that
finding a good name is tricky. ;-)

Second, even though I agree with many of Jay's points I'm not sure whether
I like the `StreamsBuilder` suggestion (i.e. any name that does not include
"topology" or a similar term) that much more.  It still doesn't describe
what that class actually does, and what the difference to `KafkaStreams`
is.  IMHO, the point of `KStreamBuilder` is that it lets you build a
logical plan (what we call "topology"), and `KafkaStreams` is the thing
that executes that plan.  I'm not yet convinced that abstracting these two
points away from the user is a good idea if the argument is that it's
potentially confusing to beginners (a claim which I am not sure is actually
true).

That said, if we rather favor "good-sounding but perhaps less technically
correct names", I'd argue we should not even use something like "Builder".
We could, for example, also pick the following names:

- KafkaStreams as the new name for the builder that creates the logical
plan, with e.g. `KafkaStreams.stream("intput-topic")` and
`KafkaStreams.table("input-topic")`.
- KafkaStreamsRunner as the new name for the executioner of the plan, with
`KafkaStreamsRunner(KafkaStreams).run()`.



On Tue, Mar 14, 2017 at 5:56 AM, Sriram Subramanian <r...@confluent.io>
wrote:

> StreamsBuilder would be my vote.
>
> > On Mar 13, 2017, at 9:42 PM, Jay Kreps <j...@confluent.io> wrote:
> >
> > Hey Matthias,
> >
> > Make sense, I'm more advocating for removing the word topology than any
> > particular new replacement.
> >
> > -Jay
> >
> > On Mon, Mar 13, 2017 at 12:30 PM, Matthias J. Sax <matth...@confluent.io
> >
> > wrote:
> >
> >> Jay,
> >>
> >> thanks for your feedback
> >>
> >>> What if instead we called it KStreamsBuilder?
> >>
> >> That's the current name and I personally think it's not the best one.
> >> The main reason why I don't like KStreamsBuilder is, that we have the
> >> concepts of KStreams and KTables, and the builder creates both. However,
> >> the name puts he focus on KStream and devalues KTable.
> >>
> >> I understand your argument, and I am personally open the remove the
> >> "Topology" part, and name it "StreamsBuilder". Not sure what others
> >> think about this.
> >>
> >>
> >> About Processor API: I like the idea in general, but I thinks it's out
> >> of scope for this KIP. KIP-120 has the focus on removing leaking
> >> internal APIs and do some cleanup how our API reflects some concepts.
> >>
> >> However, I added your idea to API discussion Wiki page and we take if
> >> from there:
> >> https://cwiki.apache.org/confluence/display/KAFKA/
> >> Kafka+Streams+Discussions
> >>
> >>
> >>
> >> -Matthias
> >>
> >>
> >>> On 3/13/17 11:52 AM, Jay Kreps wrote:
> >>> Two things:
> >>>
> >>>   1. This is a minor thing but the proposed new name for KStreamBuilder
> >>>   is StreamsTopologyBuilder. I actually think we should not put
> >> topology in
> >>>   the name as topology is not a concept you need to understand at the
> >>>   kstreams layer right now. I'd think of three categories of concepts:
> >> (1)
> >>>   concepts you need to understand to get going even for a simple
> >> example, (2)
> >>>   concepts you need to understand to operate and debug a real
> >> production app,
> >>>   (3) concepts we truly abstract and you don't need to ever understand.
> >> I
> >>>   think in the kstream layer topologies are currently category (2), and
> >> this
> >>>   is where they belong. By introducing the name in even the simplest
> >> example
> >>>   it means the user has to go read about toplogies to really understand
> >> even
> >>>   this simple snippet. What if instead we called it KStreamsBuilder?
> >>>   2. For the processor api, I think this api is mostly not for end
> >> users.
> >>>   However this are a couple cases where it might make sense to expose
> >> it. I
> >>>   think users coming from Samza, or JMS's MessageListener (
> >>>   https://docs.oracle.com/javaee/7/api/javax/jms/MessageListener.html)
> >>>   understand a simple callback interface for message processing. In
> >> fact,
> >>>   people often ask why Kafka's consumer doesn't provide such an
> >> interface.
> >>>   I'd argue we do, it's KafkaStreams. The only issue is that the
> >> processor
> >>>   API documentation is a bit scary for a person implementing this type
> >> of
> >>>   api. My observation is that people using this style of API don't do a
> >> lot
> >>>   of cross-message operations, then just do single message operations
> >> and use
> >>>   a database for anything that spans messages. They also don't factor
> >> their
> >>>   code into many MessageListeners and compose them, they just have one
> >>>   listener that has the complete handling logic. Say I am a user who
> >> wants to
> >>>   implement a single Processor in this style. Do we have an easy way to
> >> do
> >>>   that today (either with the .transform/.process methods in kstreams
> >> or with
> >>>   the topology apis)? Is there anything we can do in the way of trivial
> >>>   helper code to make this better? Also, how can we explain that
> >> pattern to
> >>>   people? I think currently we have pretty in-depth docs on our apis
> >> but I
> >>>   suspect a person trying to figure out how to implement a simple
> >> callback
> >>>   might get a bit lost trying to figure out how to wire it up. A simple
> >> five
> >>>   line example in the docs would probably help a lot. Not sure if this
> >> is
> >>>   best addressed in this KIP or is a side comment.
> >>>
> >>> Cheers,
> >>>
> >>> -Jay
> >>>
> >>> On Fri, Feb 3, 2017 at 3:33 PM, Matthias J. Sax <matth...@confluent.io
> >
> >>> wrote:
> >>>
> >>>> Hi All,
> >>>>
> >>>> I did prepare a KIP to do some cleanup some of Kafka's Streaming API.
> >>>>
> >>>> Please have a look here:
> >>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> >>>> 120%3A+Cleanup+Kafka+Streams+builder+API
> >>>>
> >>>> Looking forward to your feedback!
> >>>>
> >>>>
> >>>> -Matthias
> >>
> >>
>

Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API

Reply via email to