I see the flow to be as below:
LogStash->Log Stream->Flink ->Kafka->Live Model
                                       |
                                    Mongo/HBASE

The Live Model will again be Flink streaming data sets from Kakfa.
There you analyze the incoming stream for the certain value and once you
find this certain value , read the historical view and then do the analysis
in Flink itself.
For your java objects , i guess you can use checkpointed interface (have
not used it though yet)

Thanks
Deepak


On Fri, May 6, 2016 at 4:22 PM, <pa...@sport.dk> wrote:

> Hi there.
>
> We are putting together some BigData components for handling a large
> amount of incoming data from different log files and perform some analysis
> on the data.
>
> All data being fed into the system will go into HDFS. We plan on using
> Logstash, Kafka and Flink for bringing data from the log files and into
> HDFS. All our data located in HDFS we will designate as our historic data
> and we will use MapReduce (probably Flink, but could also be Hadoop) to
> create some aggregate views of the historic data. These views we will
> locate probably in HBase or MongoDB.
>
> These views of the historic data (also called batch views in the Lambda
> Architecture if any of you are familiar with that) we will use from the
> live model in the system. The live model is also being fed with the same
> data (through Kafka) and when the live model detects a certain value in the
> incoming data, it will perform some analysis using the views in
> HBase/MongoDB of the historic data.
>
> Now, could anyone share some knowledge regarding where it would be
> possible to implement such a live model given the components we plan on
> using? Apart from the business logic that will perform the analysis, our
> live model will at all times also contain a java object structure of maybe
> 5-10 java collections (maps, lists) containing approx 5 mio objects.
>
> So, where is it possible to implement our live model? Can we do this in
> Flink? Can we do this with another component within the Hadoop Big Data
> ecosystem?
>
> Thanks.
>
> /Palle
>



-- 
Thanks
Deepak
www.bigdatabig.com
www.keosha.net

Reply via email to