Thanks David and Matthias for reply. To make sure that I understand
correctly:
- Each stream application is limited to only one node. In that node all
stream execution DAG is processed.
- If we want to parallelize our application, we start new instance of
streams application, which will be similar to previous one (it will run all
DAG operators inside of that node) and after adding new application,  it
gets separate partition to process (the one that no other stream
application is processing).
- There is no topology to break the DAG and operators (of stream
application) into separate nodes (of cluster) and make it look like more
"dala flow"ish. Instead we have "little" end-to-end instances running on
each cluster node.
- Assume we run some aggregate operator with time windows of finance data
stream. Then we can have only one partition and only one streams
application, as increasing partitions and/or streams applications can cause
problems to the logic of particular use case.

Please correct me if I am wrong.

Thanks
Davood

On Tue, Jul 26, 2016 at 9:44 PM, Matthias J. Sax <matth...@confluent.io>
wrote:

> David's answer is correct. Just start the same application multiple
> times on different nodes and the library does the rest for you.
>
> Just one addition: as Kafka Streams is for standard application
> development, there is no need to run the application on the same nodes
> as your brokers are running (ie, applications instances could run on any
> machine outside of your broker cluster).
>
> Of course, it is possible to run the application on broker nodes. Just
> wanted to point out, that there is no co-location required between
> brokers and app instances.
>
> -Matthias
>
>
> On 07/26/2016 08:08 PM, David Garcia wrote:
> >
> http://docs.confluent.io/3.0.0/streams/architecture.html#parallelism-model
> >
> > you shouldn’t have to do anything.  Simply starting a new thread will
> “rebalance” your streaming job.  The job coordinates with tasks through
> kafka itself.
> >
> >
> >
> > On 7/26/16, 12:42 PM, "Davood Rafiei" <rafieidavo...@gmail.com> wrote:
> >
> >     Hi,
> >
> >     I am newbie in Kafka and Kafka-Streams. I read documentation to get
> >     information how it works in multi-node environment. As a result I
> want to
> >     run streams library on cluster that consists of more than one node.
> >     From what I understood, I try to resolve the following conflicts:
> >     - Streams is a standalone java application.So it runs in a single
> node, of
> >     n-node cluster of kafka.
> >     - However, streams runs on top of kafka, and if we set a
> multi-broker kafka
> >     cluster, and then run streams library from master node, then streams
> >     library will run in entire cluster.
> >
> >     So, streams library is standalone java application but to force it
> to run
> >     in multiple nodes, do we need to do something extra (in
> configuration for
> >     example) if we have already kafka running in multi-broker mode?
> >
> >
> >     Thanks
> >     Davood
> >
> >
>
>

Reply via email to