Hi Alex,

You probably didn't get the paralelism right. Serial scan has
a paralelism of one. If the paralelism isn't large enough, perf will be
slow.
If paralelism is too large, Cassandra and the disk will trash and have too
many context switches.

So you need to find your cluster's sweet spot. We documented the procedure
to do it in this blog:
http://www.scylladb.com/2017/02/13/efficient-full-table-scans-with-scylla-1-6/
and the results are here:
http://www.scylladb.com/2017/03/28/parallel-efficient-full-table-scan-scylla/
The algorithm should translate to Cassandra but you'll have to use
different rules of the thumb.

Best,
Dor


On Wed, Aug 16, 2017 at 9:50 AM, Alex Kotelnikov <
alex.kotelni...@diginetica.com> wrote:

> Hey,
>
> we are trying Cassandra as an alternative for storage huge stream of data
> coming from our customers.
>
> Storing works quite fine, and I started to validate how retrieval does. We
> have two types of that: fetching specific records and bulk retrieval for
> general analysis.
> Fetching single record works like charm. But it is not so with bulk fetch.
>
> With a moderately small table of ~2 million records, ~10Gb raw data I
> observed very slow operation (using token(partition key) ranges). It takes
> minutes to perform full retrieval. We tried a couple of configurations
> using virtual machines, real hardware and overall looks like it is not
> possible to all table data in a reasonable time (by reasonable I mean that
> since we have 1Gbit network 10Gb can be transferred in a couple of minutes
> from one server to another and when we have 10+ cassandra servers and 10+
> spark executors total time should be even smaller).
>
> I tried datastax spark connector. Also I wrote a simple test case using
> datastax java driver and see how fetch of 10k records takes ~10s so I
> assume that "sequential" scan will take 200x more time, equals ~30 minutes.
>
> May be we are totally wrong trying to use Cassandra this way?
>
> --
>
> Best Regards,
>
>
> *Alexander Kotelnikov*
>
> *Team Lead*
>
> DIGINETICA
> Retail Technology Company
>
> m: +7.921.915.06.28 <+7%20921%20915-06-28>
>
> *www.diginetica.com <http://www.diginetica.com/>*
>

Reply via email to