Hi there.

We are putting together some BigData components for handling a large amount of 
incoming data from different log files and perform some analysis on the data.

All data being fed into the system will go into HDFS. We plan on using 
Logstash, Kafka and Flink for bringing data from the log files and into HDFS. 
All our data located in HDFS we will designate as our historic data and we will 
use MapReduce (probably Flink, but could also be Hadoop) to create some 
aggregate views of the historic data. These views we will locate probably in 
HBase or MongoDB.

These views of the historic data (also called batch views in the Lambda 
Architecture if any of you are familiar with that) we will use from the live 
model in the system. The live model is also being fed with the same data 
(through Kafka) and when the live model detects a certain value in the incoming 
data, it will perform some analysis using the views in HBase/MongoDB of the 
historic data.

Now, could anyone share some knowledge regarding where it would be possible to 
implement such a live model given the components we plan on using? Apart from 
the business logic that will perform the analysis, our live model will at all 
times also contain a java object structure of maybe 5-10 java collections 
(maps, lists) containing approx 5 mio objects.

So, where is it possible to implement our live model? Can we do this in Flink? 
Can we do this with another component within the Hadoop Big Data ecosystem?

Thanks.

/Palle

Reply via email to