I have been seeing some strange trends in read latency that I wanted to
throw out there to find some explanations. We are running .6.5 in a 10 node
cluster rf=3. We find that the read latency reported by the cfstats is
always about 1/4 of the actual time it takes to get the data back to the
python client. We are not using any higher level clients, and we usually are
doing Quorum reads (rf=3). If we ask for 1 copy (vs. 2 for Quorum) it is
around 2x the time reported in cfstats. This is true whether we have a .8 ms
read or a 5000 ms read. It is always around 4x the time for a Quorum read
and 2x the time for a single value read. This tells me that much of the time
waiting for a read has nothing to do with disk random read latency. This is
contrary to what is expected.

What is that extra time being used for? Waiting 2 ms for a read value to the
client when the value is retrieved in 1ms leaves 1ms that is unexplainable.
Is the node being requested by the client doing some "work" that equal the
time spent by the node actually serving up the data? Is this the thrift
server packaging up the response to the client?

Are reads really more CPU bound? We have lower end CPUs in our nodes, is
that part of the cause?

What is cfstats actually reporting? Is it not really reporting on* ALL* of
the time required to service a read? I assume is not reporting the time to
send the result to the requesting node.

How much of this time is network time? Would Infiniband or a lower latency
network architecture reduce any of these times? If we want to reduce a 2 ms
read to a 1ms read what will help us get there? We have cached keys which
then gives us a cfstats read latency < 1ms (~.85) but it still takes 2ms to
get to the client (single read).

Why does a quorum read double everything? It seems quorum reads are
serialized and not parallel. Is that true and if so why? Obviously it takes
more time to get 2 values and compare then get one value but if that is
always 2x+ then the adjustable consistency of Cassandra comes at a very high
price.

Any other suggestions for decreasing read latency? Faster disks don't seem
as useful as faster CPUs. We have worked hard to reduce the cfstats reported
read latency and have been successful. How can we reduce the time from there
to the requesting client? What is the anatomy of a read from client request
to result? Where does the time usually go and what can help speed each step
up? Caching is the obvious answer but assume we are already caching what can
be cached (keys).

Thanks in advance for any advice or explanations anyone might have.

Reply via email to