Hi,

I'm going to read all the data in the cluster as fast as possible, i'm aware 
that spark could do such things out of the box but just wanted to do it at low 
level to see how fast it could be. So:
1. retrieved partition keys on each node using nodetool ring token ranges and 
getting distinct partition for the range
2. run the query for each partition on its main replica node using python 
(parallel job on all nodes of the cluster). i used loadBalancing strategy and 
only the local ip as contact point but i will try the whitelist policy too 
(With whitelist load balancing strategy restricted queries (read) to a 
single/local coordinator (a python script on same host as coordinator))
This mechanism turned out to be fast but not as fast as the sequential read of 
the disk could be (the query could be 100 times faster theoretically!




I'm using RF=3 in a single DC cluster with default WCL which is LOCAL_ONE. I 
suspect that may be the coordinator is also connecting other replicas but how 
can i debug that?

Is there any workaround to force the coordinator to only read data from itself 
so

if there is other replicas (beside the coordinator) for the partition key, only 
the coordinator's data would be read and returned and it should not even check 
other replicas foe the data

if the coordinator is not a replica for the partition key, it simply throw 
exception or return empty result


Is there any mechanism to accomplish this kind of local read?

Best Regards
Sent using https://www.zoho.com/mail/

Reply via email to