>
> Sorry for the slow reply, it's been crunch time on the 1.1 freeze...
>

Not a problem--thanks for the response!

What's a good starting point to get a feel for what you've added?  Is
> it PBSTracker?
>

PBSTracker is indeed a good place to start. The class stores and processes
the latencies we care about for PBS. nodetool simply calls into the
get*latencies() methods, while the ResponseHandlers call the startOperation
and log{Read/Write}Response methods. There's nothing too magical.

The PBS analysis code is in pbs/analyze_pbs.py and pbs/pbs_utils.py, which
we kept separate for patch readability but could easily rewrite in Java as
part of nodetool or similar.

Is this different conceptually from something like
> https://issues.apache.org/jira/browse/CASSANDRA-1123, other than that
> obviously you're specifically concerned with PBS-related metrics?
>

It doesn't appear that the Cassandra-specific tweaks we've made are
conceptually different from the patch you link to. Our patch performs
coarser granularity measurements than the CASSANDRA-1123 patch, splitting
the each per-replica operation time into (time spent sending the
message+processing it at the replica) and (time spent waiting for a
response).

An important difference between the two patches is that we determine the
latter latency at the coordinator by having the replica store the
acknowledgement creation time in the acknowledgement itself; it looks like
the patch you linked logs this creation time locally, requiring some
distributed log parsing to reconstruct the latencies. This reconstruction
is definitely doable. The trade-off is between space in each message
required for the timestamp and complexity in log reconstruction.

Thanks!
Peter

Reply via email to