See inline. -Matthias
On 07/26/2016 11:10 PM, Davood Rafiei wrote: > Thanks David and Matthias for reply. To make sure that I understand > correctly: > - Each stream application is limited to only one node. In that node all > stream execution DAG is processed. An application is not limited to a single node; however, an application *instance* is. I guess you understood it correctly anyway (just want to make sure, we use the same terminology) > - If we want to parallelize our application, we start new instance of > streams application, which will be similar to previous one (it will run all > DAG operators inside of that node) and after adding new application, it > gets separate partition to process (the one that no other stream > application is processing). Yes. > - There is no topology to break the DAG and operators (of stream > application) into separate nodes (of cluster) and make it look like more > "dala flow"ish. Instead we have "little" end-to-end instances running on > each cluster node. Yes. Just want to point out, that this is similar to other data flow systems (eg, Flink). Furthermore, you could manually accomplish the split, by splitting your application into multiple applications (where the output of your first application part serves as input for the second and so on...). However, I personally do not see a big advantage in splitting the application. > - Assume we run some aggregate operator with time windows of finance data > stream. Then we can have only one partition and only one streams > application, as increasing partitions and/or streams applications can cause > problems to the logic of particular use case. Yes. Kafka Streams parallelism is limited by the number of partitions of your input topics. If you can partition your data depends on your use case -- and if the use case does not allow data parallel processing you are in a bad position with regard to scaling (This is a general limitation and independent of the tool you are using...) > > Please correct me if I am wrong. > > Thanks > Davood > > On Tue, Jul 26, 2016 at 9:44 PM, Matthias J. Sax <matth...@confluent.io> > wrote: > >> David's answer is correct. Just start the same application multiple >> times on different nodes and the library does the rest for you. >> >> Just one addition: as Kafka Streams is for standard application >> development, there is no need to run the application on the same nodes >> as your brokers are running (ie, applications instances could run on any >> machine outside of your broker cluster). >> >> Of course, it is possible to run the application on broker nodes. Just >> wanted to point out, that there is no co-location required between >> brokers and app instances. >> >> -Matthias >> >> >> On 07/26/2016 08:08 PM, David Garcia wrote: >>> >> http://docs.confluent.io/3.0.0/streams/architecture.html#parallelism-model >>> >>> you shouldn’t have to do anything. Simply starting a new thread will >> “rebalance” your streaming job. The job coordinates with tasks through >> kafka itself. >>> >>> >>> >>> On 7/26/16, 12:42 PM, "Davood Rafiei" <rafieidavo...@gmail.com> wrote: >>> >>> Hi, >>> >>> I am newbie in Kafka and Kafka-Streams. I read documentation to get >>> information how it works in multi-node environment. As a result I >> want to >>> run streams library on cluster that consists of more than one node. >>> From what I understood, I try to resolve the following conflicts: >>> - Streams is a standalone java application.So it runs in a single >> node, of >>> n-node cluster of kafka. >>> - However, streams runs on top of kafka, and if we set a >> multi-broker kafka >>> cluster, and then run streams library from master node, then streams >>> library will run in entire cluster. >>> >>> So, streams library is standalone java application but to force it >> to run >>> in multiple nodes, do we need to do something extra (in >> configuration for >>> example) if we have already kafka running in multi-broker mode? >>> >>> >>> Thanks >>> Davood >>> >>> >> >> >
signature.asc
Description: OpenPGP digital signature