Re: Processing millions of messages in milliseconds -- Architecture guide required

2016-04-19 Thread Arkadiusz Bicz
Sorry, I've found one error: If you do NOT need any relational processing of your messages ( basing on historical data, or joining with other messages) and message processing is quite independent Kafka plus Spark Streaming could be overkill. On Tue, Apr 19, 2016 at 1:54 PM, Arkadiusz Bicz wrote:

Re: Processing millions of messages in milliseconds -- Architecture guide required

2016-04-19 Thread Arkadiusz Bicz
Requirements looks like my previous project for smart metering. We finally did custom solution without Spark, Hadoop and Kafka but it was 4 years ago when I did not have experience with this technologies ( some not existed or were not mature). If you do need any relational processing of your messa

Re: Processing millions of messages in milliseconds -- Architecture guide required

2016-04-19 Thread Jörn Franke
I do not think there is a simple how to for this. First you need to be clear of volumes in storage, in-transit and in-processing. Then you need to be aware of what kind of queries you want to do. Your assumption of milliseconds for he expected data volumes currently seem to be unrealistic. Howev

Re: Processing millions of messages in milliseconds -- Architecture guide required

2016-04-19 Thread Alex Kozlov
This is too big of a topic. For starters, what is the latency between you obtain the data and the data is available for analysis? Obviously if this is < 5 minutes, you probably need a streaming solution. How fast the "micro batches of seconds" need to be available for analysis? Can the problem

Re: Processing millions of messages in milliseconds -- Architecture guide required

2016-04-18 Thread Prashant Sharma
Hello Deepak, It is not clear what you want to do. Are you talking about spark streaming ? It is possible to process historical data in Spark batch mode too. You can add a timestamp field in xml/json. Spark documentation is at spark.apache.org. Spark has good inbuilt features to process json and x