Re: An extremely fast cassandra table full scan utility

DuyHai Doan Mon, 03 Oct 2016 11:11:38 -0700

Hello Siddarth

I just throw an eye over the architecture diagram. The idea of using
multiple threads, one for each token range is great. It help maxing out
parallelism.


With https://issues.apache.org/jira/browse/CASSANDRA-11521 it would be even
faster.

On Mon, Oct 3, 2016 at 7:51 PM, siddharth verma <sidd.verma29.l...@gmail.com
> wrote:

> Hi,
> I was working on a utility which can be used for cassandra full table
> scan, at a tremendously high velocity, cassandra fast full table scan.
> How fast?
> The script dumped ~ 229 million rows in 116 seconds, with a cluster of
> size 6 nodes.
> Data transfer rates were upto 25MBps was observed on cassandra nodes.
>
> For some use case, a spark cluster was required, but for some reason we
> couldn't create spark cluster. Hence, one may use this utility to iterate
> through the entire table at very high speed.
>
> But now for any full scan, I use it freely for my adhoc java programs to
> manipulate or aggregate cassandra data.
>
> You can customize the options, setting fetch size, consistency level,
> degree of parallelism(number of threads) according to your need.
>
> You can visit https://github.com/siddv29/cfs to go through the code, see
> the logic behind it, or try it in your program.
> A sample program is also provided.
>
> I coded this utility in java.
>
> Bhuvan Rawal(bhu1ra...@gmail.com) and I worked on this concept.
> For python you may visit his blog(http://casualreflections.
> io/tech/cassandra/python/Multiprocess-Producer-Cassandra-Python) and
> github(https://gist.github.com/bhuvanrawal/93c5ae6cdd020de47e0981d36d2c07
> 85)
>
> Looking forward to your suggestions and comments.
>
> P.S. Give it a try. Trust me, the iteration speed is awesome!!
> It is a bare application, built asap. If you would like to contribute to
> the java utility, add or build up on it, do reach out
> sidd.verma29.li...@gmail.com
>
> Thanks and Regards,
> Siddharth Verma
> (previous email id on this mailing list : verma.siddha...@snapdeal.com)
>

Re: An extremely fast cassandra table full scan utility

Reply via email to