I think the row whose row key falls into the token range of the high latency 
node is likely to have more columns than the other nodes.  I have three nodes 
with RF = 3, so all the nodes have all the data. And CL = Quorum, meaning each 
request is sent to all three nodes and response is sent back to client when two 
of them respond. What exactly does "Read Count" from "nodetool cfstats" mean 
then, should it be the same across all the nodes? I checked with Hector, it 
uses Round Robin LB strategy. And I also tested writes, and the writes are 
distributed across the cluster evenly. Below is the output from nodetool. Any 
one has a clue what might happened?

Read Count: 318679
Read Latency: 72.47641436367003 ms.
Write Count: 158680
Write Latency: 0.07918750315099571 ms.
Node 2:
Read Count: 251079 Read Latency: 86.91948475579399 ms. Write Count: 158450 
Write Latency: 0.1744694540864626 ms.
Node 3:
Read Count: 149876 Read Latency: 168.14125553123915 ms. Write Count: 157896 
Write Latency: 0.06468631250949992 ms.

 nodetool ring
Address         DC          Rack        Status State   Load            
Effective-Ownership Token                                       
           113427455640312821154458202477256070485      datacenter1 rack1       Up     Normal  35.85 GB        100.00%  
           0                                        datacenter1 rack1       Up     Normal  35.86 GB        100.00%  
           56713727820156410577229101238628035242      datacenter1 rack1       Up     Normal  35.85 GB        100.00%  

Keyspace: benchmark:
  Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
  Durable Writes: true
    Options: [replication_factor:3]

I am really confused by the Read Count number from nodetool cfstats

Really appreciate any hints.

 From: Wei Zhu <wz1...@yahoo.com>
To: Cassandr usergroup <user@cassandra.apache.org> 
Sent: Thursday, November 8, 2012 9:37 PM
Subject: read request distribution

Hi All,
I am doing a benchmark on a Cassandra. I have a three node cluster with RF=3. I 
generated 6M rows with sequence  number from 1 to 6m, so the rows should be 
evenly distributed among the three nodes disregarding the replicates. 

I am doing a benchmark with read only requests, I generate read request for 
randomly generated keys from 1 to 6M. Oddly, nodetool cfstats, reports that one 
node has only half the requests as the other one and the third node sits in the 
middle. So the ratio is like 2:3:4. The node with the most read requests 
actually has the smallest latency and the one with the least read requests 
reports the largest latency. The difference is pretty big, the fastest is 
almost double the slowest.

All three nodes have the exactly the same hardware and the data size on each 
node are the same since the RF is three and all of them have the complete data. 
I am using Hector as client and the random read request are in millions. I 
can't think of a reasonable explanation.  Can someone please shed some lights?


Reply via email to