Hey, we are trying Cassandra as an alternative for storage huge stream of data coming from our customers.
Storing works quite fine, and I started to validate how retrieval does. We have two types of that: fetching specific records and bulk retrieval for general analysis. Fetching single record works like charm. But it is not so with bulk fetch. With a moderately small table of ~2 million records, ~10Gb raw data I observed very slow operation (using token(partition key) ranges). It takes minutes to perform full retrieval. We tried a couple of configurations using virtual machines, real hardware and overall looks like it is not possible to all table data in a reasonable time (by reasonable I mean that since we have 1Gbit network 10Gb can be transferred in a couple of minutes from one server to another and when we have 10+ cassandra servers and 10+ spark executors total time should be even smaller). I tried datastax spark connector. Also I wrote a simple test case using datastax java driver and see how fetch of 10k records takes ~10s so I assume that "sequential" scan will take 200x more time, equals ~30 minutes. May be we are totally wrong trying to use Cassandra this way? -- Best Regards, *Alexander Kotelnikov* *Team Lead* DIGINETICA Retail Technology Company m: +7.921.915.06.28 *www.diginetica.com <http://www.diginetica.com/>*