I am evaluating Cassandra, and Read latency is the biggest concern in terms
of performance. As I test various scenarios and configurations I am getting
surprising results. I have a 2 node cluster with both nodes connected to
direct attached storage. The read latency pulling data off the raid 10
storage is worse than off of the internal drive. The drives are of the same
sata 7200 rpm speed, and this does not make sense. This is for single,
isolated requests, obviously in scale the RAID should perform better... I
have not started testing concurrent reads in scale as the single reads are
too slow to begin with. I am getting 20-30ms response time off of internal
drives and 50-70 ms response time through the raid volumes (as reported in
cfstats). The system is totally idle and all data has been cleanly
compacted. These both seem very high numbers. All cache as been turned off
for testing as we expect our cache hit ratio to not be that good. More
spindles usually speeds things up, but I am seeing the opposite. I am using
default settings for configuration. My write latency is very good and in
line with what I see in terms of posted benchmarks.

What are the recommended solutions to reduce read latency in terms of CF
definition, cassandra configuration, hardware, etc?
Do more keyspaces & column families increase latency (I originally saw 3-5
ms read latency with a small amount of data and 1 Keyspace/CF)?
Shouldn't RAID 10 help overall latency and throughput (more, faster disks
are better)?
What is a "normal" expected read latency with no cache?
I am using super columns, would read latency and overall performance be
faster to use a compound column instead?
I have many different CF to isolate different data (some with the same key),
would I be better served to combine CFs and thereby reduce the number of CFs
and possibly increase key cache hits (at the cost of bigger rows)? I am
testing with 10 Keyspaces and 6 CFs each.

Any recommendations would be appreciated.

Thanks.

Reply via email to