Hi Guozhang, I was comparing it with DAG processing in Spark. Spark Streaming is a close competitor to Kafka Streams, one difference which might accounts for a faster performance was that Spark submits the code to the code and does a bit of code optimization that its end.
Lets consider an example code which has map-> map-> reduce. The map functions will not be executed unless reduce executes since its a terminal operation but whenever execution happens spark will traverse data only once and may call map functions one after the another. This is same as in Java 8 concept of streams. Please refer to the following link, that explains it very well https://stackoverflow.com/questions/25836316/how-dag-works-under-the-covers-in-rdd/30685279#30685279 In Kafka Streams, we do specify the topology here but i dont think we do some sort of code optimization. My earlier example will traverse the data twice once for each map phase. Please excuse with the late response, I am operating out of different geography. -Sameer. On Tue, Jul 18, 2017 at 2:44 AM, Guozhang Wang <wangg...@gmail.com> wrote: > If that is what it meant for DAG processing (we still need to confirm with > Sameer), then programming-wise I do not see what's the difference with > Kafka Streams since inside Streams users is also just specifying the > topology as a DAG: > > https://kafka.apache.org/0110/documentation/streams/core- > concepts#streams_topology > > What is even better is that for Streams since we use Kafka as intermeidate > buffer between connected sub-topologies user do not need to worry about > back-pressure at all: > > http://docs.confluent.io/current/streams/architecture.html#backpressure > > > And flexibility-wise, as you mention "it is a bit more flexible than Kafka > Streams", I also cannot agree with you that it is the case, since with > Kafka Streams threading model people can easily have multiple tasks > representing different connected parts (i.e. sub-topologies) of the DAG > which are then hosted by different threads executing on their own pace > concurrently: > > https://kafka.apache.org/0110/documentation/streams/architecture#streams_ > architecture_threads > > Again, this is because different threads never need to talk to each other, > but they just read / write data from / to Kafka topics which are then the > persistent buffer of the intermeidate streams, no synchronization between > threads are needed. > > > Guozhang > > On Mon, Jul 17, 2017 at 10:38 AM, David Garcia <dav...@spiceworks.com> > wrote: > > > On that note, akka streams has Kafka integration. We use it heavily and > > it is quite a bit more flexible than K-Streams (which we also useā¦but for > > simpler applications) Akka-streams-Kafka is particularly good for > > asynchronous processing: http://doc.akka.io/docs/akka- > > stream-kafka/current/home.html > > > > -David > > > > On 7/17/17, 12:35 PM, "David Garcia" <dav...@spiceworks.com> wrote: > > > > I think he means something like Akka Streams: > > http://doc.akka.io/docs/akka/2.5.2/java/stream/stream-graphs.html > > > > Directed Acyclic Graphs are trivial to construct in Akka Streams and > > use back-pressure to preclude memory issues. > > > > -David > > > > On 7/17/17, 12:20 PM, "Guozhang Wang" <wangg...@gmail.com> wrote: > > > > Sameer, > > > > Could you elaborate a bit more what do you mean by "DAG > > processing"? > > > > > > Guozhang > > > > > > On Sun, Jul 16, 2017 at 11:58 PM, Sameer Kumar < > > sam.kum.w...@gmail.com> > > wrote: > > > > > Currently, we don't have DAG processing in Kafka Streams. > Having > > a DAG has > > > its own share of advantages in that, it can optimize code on > its > > own and > > > come up with a optimized execution plan. > > > > > > Are we exploring in this direction, do we have this in our > > current roadmap. > > > > > > -Sameer. > > > > > > > > > > > -- > > -- Guozhang > > > > > > > > > > > > > -- > -- Guozhang >