Apache Cassandra is not great in terms of performance at the moment for batch analytics workloads that require a full table scan. I would look at FiloDB for all the benefits and familiarity of Cassandra with better streaming and analytics performance: https://github.com/filodb/FiloDB
There are also some outstanding tickets around improving bulk reads in Cassandra (see https://issues.apache.org/jira/browse/CASSANDRA-9259 for the full gory details), but it appears to be abandonded by the initial set of contributors. On Wed, 16 Aug 2017 at 09:51 Alex Kotelnikov <alex.kotelni...@diginetica.com> wrote: > Hey, > > we are trying Cassandra as an alternative for storage huge stream of data > coming from our customers. > > Storing works quite fine, and I started to validate how retrieval does. We > have two types of that: fetching specific records and bulk retrieval for > general analysis. > Fetching single record works like charm. But it is not so with bulk fetch. > > With a moderately small table of ~2 million records, ~10Gb raw data I > observed very slow operation (using token(partition key) ranges). It takes > minutes to perform full retrieval. We tried a couple of configurations > using virtual machines, real hardware and overall looks like it is not > possible to all table data in a reasonable time (by reasonable I mean that > since we have 1Gbit network 10Gb can be transferred in a couple of minutes > from one server to another and when we have 10+ cassandra servers and 10+ > spark executors total time should be even smaller). > > I tried datastax spark connector. Also I wrote a simple test case using > datastax java driver and see how fetch of 10k records takes ~10s so I > assume that "sequential" scan will take 200x more time, equals ~30 minutes. > > May be we are totally wrong trying to use Cassandra this way? > > -- > > Best Regards, > > > *Alexander Kotelnikov* > > *Team Lead* > > DIGINETICA > Retail Technology Company > > m: +7.921.915.06.28 <+7%20921%20915-06-28> > > *www.diginetica.com <http://www.diginetica.com/>* > -- Ben Bromhead CTO | Instaclustr <https://www.instaclustr.com/> +1 650 284 9692 Managed Cassandra / Spark on AWS, Azure and Softlayer