Hi Guozhang,

I was comparing it with DAG processing in Spark.  Spark Streaming is a
close competitor to Kafka Streams, one difference which might accounts for
a faster performance was that Spark submits the code to the code and does a
bit of code optimization that its end.

Lets consider an example code which has map-> map-> reduce. The map
functions will not be executed unless reduce executes since its a terminal
operation but whenever execution happens spark will traverse data only once
and may call map functions one after the another. This is same as in Java 8
concept of streams.

Please refer to the following link, that explains it very well
https://stackoverflow.com/questions/25836316/how-dag-works-under-the-covers-in-rdd/30685279#30685279

In Kafka Streams, we do specify the topology here but i dont think we do
some sort of code optimization. My earlier example will traverse the data
twice once for each map phase.

Please excuse with the late response, I am operating out of different
geography.

-Sameer.



On Tue, Jul 18, 2017 at 2:44 AM, Guozhang Wang <wangg...@gmail.com> wrote:

> If that is what it meant for DAG processing (we still need to confirm with
> Sameer), then programming-wise I do not see what's the difference with
> Kafka Streams since inside Streams users is also just specifying the
> topology as a DAG:
>
> https://kafka.apache.org/0110/documentation/streams/core-
> concepts#streams_topology
>
> What is even better is that for Streams since we use Kafka as intermeidate
> buffer between connected sub-topologies user do not need to worry about
> back-pressure at all:
>
> http://docs.confluent.io/current/streams/architecture.html#backpressure
>
>
> And flexibility-wise, as you mention "it is a bit more flexible than Kafka
> Streams", I also cannot agree with you that it is the case, since with
> Kafka Streams threading model people can easily have multiple tasks
> representing different connected parts (i.e. sub-topologies) of the DAG
> which are then hosted by different threads executing on their own pace
> concurrently:
>
> https://kafka.apache.org/0110/documentation/streams/architecture#streams_
> architecture_threads
>
> Again, this is because different threads never need to talk to each other,
> but they just read / write data from / to Kafka topics which are then the
> persistent buffer of the intermeidate streams, no synchronization between
> threads are needed.
>
>
> Guozhang
>
> On Mon, Jul 17, 2017 at 10:38 AM, David Garcia <dav...@spiceworks.com>
> wrote:
>
> > On that note, akka streams has Kafka integration.  We use it heavily and
> > it is quite a bit more flexible than K-Streams (which we also useā€¦but for
> > simpler applications)  Akka-streams-Kafka is particularly good for
> > asynchronous processing: http://doc.akka.io/docs/akka-
> > stream-kafka/current/home.html
> >
> > -David
> >
> > On 7/17/17, 12:35 PM, "David Garcia" <dav...@spiceworks.com> wrote:
> >
> >     I think he means something like Akka Streams:
> > http://doc.akka.io/docs/akka/2.5.2/java/stream/stream-graphs.html
> >
> >     Directed Acyclic Graphs are trivial to construct in Akka Streams and
> > use back-pressure to preclude memory issues.
> >
> >     -David
> >
> >     On 7/17/17, 12:20 PM, "Guozhang Wang" <wangg...@gmail.com> wrote:
> >
> >         Sameer,
> >
> >         Could you elaborate a bit more what do you mean by "DAG
> > processing"?
> >
> >
> >         Guozhang
> >
> >
> >         On Sun, Jul 16, 2017 at 11:58 PM, Sameer Kumar <
> > sam.kum.w...@gmail.com>
> >         wrote:
> >
> >         > Currently, we don't have DAG processing in Kafka Streams.
> Having
> > a DAG has
> >         > its own share of advantages in that, it can optimize code on
> its
> > own and
> >         > come up with a optimized execution plan.
> >         >
> >         > Are we exploring in this direction, do we have this in our
> > current roadmap.
> >         >
> >         >  -Sameer.
> >         >
> >
> >
> >
> >         --
> >         -- Guozhang
> >
> >
> >
> >
> >
>
>
> --
> -- Guozhang
>

Reply via email to