Re: Running cluster of stream processing application

2017-02-03 Thread Guozhang Wang
The re-discover new consumer member within the group is part of the Consumer Rebalance protocol that Streams simply relies on. More details can be found here: https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Client-side+Assignment+Proposal A one sentence summary is that the new consumer wi

Re: Running cluster of stream processing application

2017-02-03 Thread Sachin Mittal
> Any reason why you don't just let streams create the changelog topic? Yes you should partition it the same as the source topic. Only reason is that I need to use my max.message.bytes and in version 0.10.0.1 configuring the same to state store supplier is not supported. But I understood that numb

Re: Running cluster of stream processing application

2017-02-03 Thread Damian Guy
Hi Sachin, On 3 February 2017 at 09:07, Sachin Mittal wrote: > > 1. Now what I wanted to know is that for separate machines running same > instance of the streams application, my application.id would be same > right. > That is correct. > If yes then how does kafka cluser know which partition

Re: Running cluster of stream processing application

2017-02-03 Thread Sachin Mittal
Hello All, I am revisiting this topic as now I am actually configuring a partitioned topic and would like multiple threads of my streams application running on different instances to process this partitioned topic in parallel. So I have once source topic partitioned into 40 partitions. The message

Re: Running cluster of stream processing application

2016-12-18 Thread Matthias J. Sax
> 1. Do we need to call system exit in UncaughtExceptionHandler. There is no *need* for it. It really depends what you *want* to do. System.exit is quite a hard termination of your application. Usually, if your Streams part of you application dies, you still want to clean up the other parts of you

Re: Running cluster of stream processing application

2016-12-17 Thread Sachin Mittal
Hi, Thanks for the suggestions. Before running the streams application in a standby cluster I was trying to get the graceful shutdown right. I have code something like this streams.setUncaughtExceptionHandler(new Thread.UncaughtExceptionHandler() { public void uncaughtException(

Re: Running cluster of stream processing application

2016-12-12 Thread Damian Guy
In the scenario you mention above about max.poll.interval.ms, yes if the timeout was reached then there would be a rebalance and one of the standby tasks would take over. However the original task may still be processing the data when the rebalance occurs and would throw an exception when it tries

Re: Running cluster of stream processing application

2016-12-12 Thread Sachin Mittal
Understood. Also the line Thanks for the application. It is not clear that clustering depends on how source topics are partitioned. Should be read as Thanks for the explanation. It is now clear that clustering depends on how source topics are partitioned. Apologies for auto-correct. One think I

Re: Running cluster of stream processing application

2016-12-12 Thread Damian Guy
Hi Sachin, The KafkaStreams StreamsPartitionAssignor will take care of assigning the Standby Tasks to the other instances of your Kafka Streams application. The state store updates are all handled by reading from the change-logs and updating local copies, there is no communication required between

Re: Running cluster of stream processing application

2016-12-12 Thread Sachin Mittal
Hi, Thanks for the application. It is not clear that clustering depends on how source topics are partitioned. In our case I guess num.standby.replicas settings is best suited. If say I set this to 2 and run two more same application in two different machines, how would my original instance know in

Re: Running cluster of stream processing application

2016-12-09 Thread Matthias J. Sax
About failure and restart. Kafka Streams does not provide any tooling for this. It's a library. However, because it is a library it is also agnostic to whatever tool you want to use. You can for example you a resource manager like Mesos or YARN, or you containerize your application, or you use too

Re: Running cluster of stream processing application

2016-12-09 Thread Damian Guy
Hi Sachin, What you have suggested will never happen. If there is only 1 partition there will only ever be one consumer of that partition. So if you had 2 instances of your streams application, and only a single input partition, only 1 instance would be processing the data. If you are running like

Re: Running cluster of stream processing application

2016-12-08 Thread Sachin Mittal
Hi, I followed the document and I have few questions. Say I have a single partition input key topic and say I run 2 streams application from machine1 and machine2. Both the application have same application id are have identical code. Say topic1 has messages like (k1, v11) (k1, v12) (k1, v13) (k2,

Re: Running cluster of stream processing application

2016-12-08 Thread Mathieu Fenniak
Hi Sachin, Some quick answers, and a link to some documentation to read more: - If you restart the application, it will start from the point it crashed (possibly reprocessing a small window of records). - You can run more than one instance of the application. They'll coordinate by virtue of bei

Running cluster of stream processing application

2016-12-08 Thread Sachin Mittal
Hi All, We were able to run a stream processing application against a fairly decent load of messages in production environment. To make the system robust say the stream processing application crashes, is there a way to make it auto start from the point when it crashed? Also is there any concept l