Hi, I'm going to read all the data in the cluster as fast as possible, i'm aware that spark could do such things out of the box but just wanted to do it at low level to see how fast it could be. So: 1. retrieved partition keys on each node using nodetool ring token ranges and getting distinct partition for the range 2. run the query for each partition on its main replica node using python (parallel job on all nodes of the cluster). i used loadBalancing strategy and only the local ip as contact point but i will try the whitelist policy too (With whitelist load balancing strategy restricted queries (read) to a single/local coordinator (a python script on same host as coordinator)) This mechanism turned out to be fast but not as fast as the sequential read of the disk could be (the query could be 100 times faster theoretically!
I'm using RF=3 in a single DC cluster with default WCL which is LOCAL_ONE. I suspect that may be the coordinator is also connecting other replicas but how can i debug that? Is there any workaround to force the coordinator to only read data from itself so if there is other replicas (beside the coordinator) for the partition key, only the coordinator's data would be read and returned and it should not even check other replicas foe the data if the coordinator is not a replica for the partition key, it simply throw exception or return empty result Is there any mechanism to accomplish this kind of local read? Best Regards Sent using https://www.zoho.com/mail/