Re: Cassandra read throughput with little/no caching.

James Masson Wed, 02 Jan 2013 03:28:39 -0800

On 31/12/12 18:45, Tyler Hobbs wrote:

On Mon, Dec 31, 2012 at 11:24 AM, James Masson <james.mas...@opigram.com
<mailto:james.mas...@opigram.com>> wrote:



    Well, it turns out the Read-Request Latency graph in Ops-Center is
    highly misleading.

    Using jconsole, the read-latency for the column family in question
    is actually normally around 800 microseconds, punctuated by
    occasional big spikes that drive up the averages.

    Towards the end of the batch process, the Opscenter reported average
    latency is up above 4000 microsecs, and forced compactions no longer
    help drive the latency down again.

    I'm going to stop relying on OpsCenter for data for performance
    analysis metrics, it just doesn't have the resolution.


James, it's worth pointing out that Read Request Latency in OpsCenter is
measuring at the coordinator level, so it includes the time spent
sending requests to replicas and waiting for a response.  There's
another latency metric that is per-column family named Local Read
Latency; it sounds like this is the equivalent number that you were
looking at in jconsole.  This metric basically just includes the time to
read local caches/memtables/sstables.

We are looking to rename one or both of the metrics for clarity; any
input here would be helpful. For example, we're considering "Coordinated
Read Request Latency" or "Client Read Request Latency" in place of just
"Read Request Latency".

--
Tyler Hobbs
DataStax <http://datastax.com/>



Hi Tyler,

thanks for clarifying this. So you're saying the difference between theglobal Read Request latency in opscenter, and the column family specificone is in the effort coordinating a validated read across multiplereplicas? Is this not part of what Hector does for itself?

Essentially, I'm looking to see whether I can use this to derive whereany extra latency from a client request comes from.

As for names, I'd suggest "cluster coordinated read request latency",bit of a mouthful, I know.

Is there anywhere I can find concrete definitions of what the stats inOpsCenter, and raw Cassandra via JMX mean? The docs I've found seemquite ambiguous.

I still think that the data resolution that OpsCenter gives makes itmore suitable for trending/alerting rather than chasing down trickyperformance issues. This sort of investigation work is what I do for aliving, I typically use intervals of 10 seconds or lower, and don'taverage my data. Although, storing your data inside the database yourmeasuring does restrict your options a little :-)


regards

James M

Re: Cassandra read throughput with little/no caching.

Reply via email to