We are running .6.8 and are reaching 1tb/node in a 10 node cluster rf=3. Our
reads times seem to be getting worse as we load data into the cluster, and I
am worried there is a scale problem in terms of large column families. All
benchmarks/times come from cfstats reporting so no client code or times are
referenced. Our initial tests always hovered around <=5ms in terms of read
latency. We then went through a lot of work to load large amounts of data
into the system on now our read latency is ~20-25ms. We can find no reason
for this, and we have checked under load, no load, with manually compacted
CFs etc. and the numbers all seem consistent to what we saw before but 3-4x
slower. We then compared to a another larger CF that we have loaded and we
are seeing it taking ~50-60ms to read from. The scare for us is that the
data file for the slower CF is 3x the size of the smaller one with 3x the
latency to read. We were expecting a < 5ms read latency (with key caching)
when we had a lot less data in the cluster but are worried it will only get
worse as the table gets bigger. The 20ms table file is ~30gb and the bigger
one is ~100gb. I keep thinking we are missing something obvious and this is
just a coincidence, as it does not make sense. We also upgraded from .6.6
between tests also so we will try to see if .6.6 is faster but is the only
real change that has occurred in our cluster.

I have read that read latency goes up with the total data size, but to what
degree should we expect a degradation in performance? What is the "normal"
read latency range if there is such a thing for a small slice of scol/cols?
Can we really put 2TB of data on a node and get good read latency querying
data off of a handful of CFs? Any experience or explanations would be
greatly appreciated.

Thanks in advance for any help!

Reply via email to