Re: Cassandra - Spark - Flume: best architecture for log analytics.

Ipremyadav Thu, 23 Jul 2015 00:52:00 -0700

Though DSE cassandra comes with hadoop integration, this is clearly is use case 
for hadoop. 
Any reason why cassandra is your first choice?




> On 23 Jul 2015, at 6:12 a.m., Pierre Devops <pierredev...@gmail.com> wrote:
> 
> Cassandra is not very good at massive read/bulk read if you need to retrieve 
> and compute a large amount of data on multiple machines using something like 
> spark or hadoop (or you'll need to hack and process the sstable directly, 
> something which is not "natively" supported, you'll have to hack your way)
> 
> However, it's very good to store and retrieve them once they have been 
> processed and sorted. That's why I would opt for solution 2) or for another 
> solution which process data before inserting them in cassandra, and doesn't 
> use cassandra as a temporary store.
> 
> 2015-07-23 2:04 GMT+02:00 Renato Perini <renato.per...@gmail.com>:
>> Problem: Log analytics.
>> 
>> Solutions:
>>        1) Aggregating logs using Flume and storing the aggregations into 
>> Cassandra. Spark reads data from Cassandra, make some computations
>> and write the results in distinct tables, still in Cassandra.
>>        2) Aggregating logs using Flume to a sink, streaming data directly 
>> into Spark. Spark make some computations and store the results in Cassandra.
>>        3) *** your solution ***
>> 
>> Which is the best workflow for this task?
>> I would like to setup something flexible enough to allow me to use batch 
>> processing and realtime streaming without major fuss.
>> 
>> Thank you in advance.
>

Re: Cassandra - Spark - Flume: best architecture for log analytics.

Reply via email to