Hello Siddarth I just throw an eye over the architecture diagram. The idea of using multiple threads, one for each token range is great. It help maxing out parallelism.
With https://issues.apache.org/jira/browse/CASSANDRA-11521 it would be even faster. On Mon, Oct 3, 2016 at 7:51 PM, siddharth verma <sidd.verma29.l...@gmail.com > wrote: > Hi, > I was working on a utility which can be used for cassandra full table > scan, at a tremendously high velocity, cassandra fast full table scan. > How fast? > The script dumped ~ 229 million rows in 116 seconds, with a cluster of > size 6 nodes. > Data transfer rates were upto 25MBps was observed on cassandra nodes. > > For some use case, a spark cluster was required, but for some reason we > couldn't create spark cluster. Hence, one may use this utility to iterate > through the entire table at very high speed. > > But now for any full scan, I use it freely for my adhoc java programs to > manipulate or aggregate cassandra data. > > You can customize the options, setting fetch size, consistency level, > degree of parallelism(number of threads) according to your need. > > You can visit https://github.com/siddv29/cfs to go through the code, see > the logic behind it, or try it in your program. > A sample program is also provided. > > I coded this utility in java. > > Bhuvan Rawal(bhu1ra...@gmail.com) and I worked on this concept. > For python you may visit his blog(http://casualreflections. > io/tech/cassandra/python/Multiprocess-Producer-Cassandra-Python) and > github(https://gist.github.com/bhuvanrawal/93c5ae6cdd020de47e0981d36d2c07 > 85) > > Looking forward to your suggestions and comments. > > P.S. Give it a try. Trust me, the iteration speed is awesome!! > It is a bare application, built asap. If you would like to contribute to > the java utility, add or build up on it, do reach out > sidd.verma29.li...@gmail.com > > Thanks and Regards, > Siddharth Verma > (previous email id on this mailing list : verma.siddha...@snapdeal.com) >