Sorry, I've found one error:
If you do NOT need any relational processing of your messages ( basing on
historical data, or joining with other messages) and message
processing is quite independent Kafka plus Spark Streaming could be
overkill.
On Tue, Apr 19, 2016 at 1:54 PM, Arkadiusz Bicz
wrote:
Requirements looks like my previous project for smart metering. We
finally did custom solution without Spark, Hadoop and Kafka but it was
4 years ago when I did not have experience with this technologies (
some not existed or were not mature).
If you do need any relational processing of your messa
I do not think there is a simple how to for this. First you need to be clear of
volumes in storage, in-transit and in-processing. Then you need to be aware of
what kind of queries you want to do. Your assumption of milliseconds for he
expected data volumes currently seem to be unrealistic. Howev
This is too big of a topic. For starters, what is the latency between you
obtain the data and the data is available for analysis? Obviously if this
is < 5 minutes, you probably need a streaming solution. How fast the
"micro batches of seconds" need to be available for analysis? Can the
problem
Hello Deepak,
It is not clear what you want to do. Are you talking about spark streaming
? It is possible to process historical data in Spark batch mode too. You
can add a timestamp field in xml/json. Spark documentation is at
spark.apache.org. Spark has good inbuilt features to process json and
x