Hi Frank, You could try this https://github.com/siddv29/cfs
I have processed 1.2 billion rows in 480 seconds with just 20 threads on client side. C* 3.0.9 Nodes = 6 RF = 3 Have a go at it. You might be surprised. Regards, On Thu, Jan 19, 2017 at 5:35 PM, Frank Hughes <frankhughes...@gmail.com> wrote: > Hello there, > > I'm running a 4 node cluster of Cassandra 3.9 with a replication factor of > 4. > > I want to be able to run a java process on each node only selecting a 25% > of the data on each node, > so i can process all of the data in parallel on each node. > > What is the best way to do this with the java driver ? > > I was assuming I could retrieve the token ranges for each node and page > through the data using these ranges, but this includes the replicated data. > I was hoping there was away of only selecting the data that a node is > responsible for and avoiding the replicated data. > > Many thanks for any help and guidance, > > Frank Hughes > -- Siddharth Verma (Visit https://github.com/siddv29/cfs for a high speed cassandra full table scan)