Though DSE cassandra comes with hadoop integration, this is clearly is use case for hadoop. Any reason why cassandra is your first choice?
> On 23 Jul 2015, at 6:12 a.m., Pierre Devops <pierredev...@gmail.com> wrote: > > Cassandra is not very good at massive read/bulk read if you need to retrieve > and compute a large amount of data on multiple machines using something like > spark or hadoop (or you'll need to hack and process the sstable directly, > something which is not "natively" supported, you'll have to hack your way) > > However, it's very good to store and retrieve them once they have been > processed and sorted. That's why I would opt for solution 2) or for another > solution which process data before inserting them in cassandra, and doesn't > use cassandra as a temporary store. > > 2015-07-23 2:04 GMT+02:00 Renato Perini <renato.per...@gmail.com>: >> Problem: Log analytics. >> >> Solutions: >> 1) Aggregating logs using Flume and storing the aggregations into >> Cassandra. Spark reads data from Cassandra, make some computations >> and write the results in distinct tables, still in Cassandra. >> 2) Aggregating logs using Flume to a sink, streaming data directly >> into Spark. Spark make some computations and store the results in Cassandra. >> 3) *** your solution *** >> >> Which is the best workflow for this task? >> I would like to setup something flexible enough to allow me to use batch >> processing and realtime streaming without major fuss. >> >> Thank you in advance. >