On 02/01/13 16:18, Tyler Hobbs wrote:
On Wed, Jan 2, 2013 at 5:28 AM, James Masson <james.mas...@opigram.com
<mailto:james.mas...@opigram.com>> wrote:
>
1) Hector sends a request to some node in the cluster, which will act as
the coordinator.
2) The coordinator then sends the actual read requests out to each of
the (RF) replicas.
3a) The coordinator waits for responses from the replicas; how many it
waits for depends on the consistency level.
3b) The replicas perform actual cache/memtable/sstable reads and respond
to the coordinator when complete
4) Once the required number of replicas have responded, the coordinator
replies to the client (Hector).

The Read Request Latency metric is measuring the time taken in steps 2
through 4.  The CF Local Read Latency metric is only capturing the time
taken in step 3b.



Great, that's exactly the level of detail I'm looking for.



    Is there anywhere I can find concrete definitions of what the stats
    in OpsCenter, and raw Cassandra via JMX mean? The docs I've found
    seem quite ambiguous.


This has pretty good writeups of each:
http://www.datastax.com/docs/opscenter/online_help/performance/index#opscenter-performance-metrics

Your description above was much better :-) I'm more interested in docs for the raw metrics provided in JMX.



    I still think that the data resolution that OpsCenter gives makes it
    more suitable for trending/alerting rather than chasing down tricky
    performance issues. This sort of investigation work is what I do for
    a living, I typically use intervals of 10 seconds or lower, and
    don't average my data. Although, storing your data inside the
    database your measuring does restrict your options a little :-)


True, there's a limit to what you can detect with 60 second resolution.
We've considered being able to report metrics at a finer resolution
without durably storing them anywhere, which would be useful for when
you're actively watching the cluster.

That would be a great feature, but it's quite difficult taking high-resolution data capture without disturbing the system you're trying to measure.

Perhaps worth taking the data-capture points off-list?

James M

Reply via email to