Sorry for the slow reply, it's been crunch time on the 1.1 freeze... What's a good starting point to get a feel for what you've added? Is it PBSTracker?
Is this different conceptually from something like https://issues.apache.org/jira/browse/CASSANDRA-1123, other than that obviously you're specifically concerned with PBS-related metrics? On Thu, Jan 19, 2012 at 11:59 AM, Peter Bailis <pbai...@cs.berkeley.edu> wrote: > We recently completed research at UC Berkeley that's highly relevant to > Cassandra and are interested in feedback from the Cassandra developer > community. In brief, eventually consistent replication (which is often > faster than strongly consistent replication) provides no *guarantees* about > the recency of data returned. However, we can accurately provide > *expectations* of data recency. Our work, which we call Probabilistically > Bounded Staleness (PBS), helps make these predictions. Using PBS, we can > optimize the trade-off between latency and consistency provided by partial > quorums (R+W <= N) by predicting both with high accuracy. > > Currently, in Cassandra, there's no good way to predict the performance > benefits of using partial quorums or the consistency they provide. However, > as you're probably well-aware, Cassandra uses partial quorums (N=3, R=W=1) > by *default*, so this work is particularly relevant to many deployments. By > measuring the latency of messaging and using modeling techniques we've > developed, Cassandra can do better by describing the probability of > consistency according to both time and versions (see an interactive demo in > your browser at http://cs.berkeley.edu/~pbailis/projects/pbs/#demo and a > good write-up by Datastax's Paul Cannon on their blog last week: > http://www.datastax.com/dev/blog/your-ideal-performance-consistency-tradeoff). > Moreover, these techniques are broadly applicable: for example, in our > Technical Report (http://cs.berkeley.edu/Pubs/TechRpts/2012/EECS-2012-4.pdf), > we analyze Cassandra as well as production deployments of Voldemort and > Riak at LinkedIn and Yammer. > > We've developed a patch for Cassandra that performs this profiling and > analysis and are potentially interested in working to integrate this as a > feature in Cassandra (see code and documentation at: > https://github.com/pbailis/cassandra-pbs). > > We welcome any feedback or questions you might have. > > Thanks! > Peter Bailis > UC Berkeley > > More info: > You can read an overview of PBS on our project page: > http://cs.berkeley.edu/~pbailis/projects/pbs/ > You can also read our technical report on PBS that has more technical > detail: http://cs.berkeley.edu/Pubs/TechRpts/2012/EECS-2012-4.pdf > > Daniel Abadi recently blogged about the latency-consistency trade-off: > http://dbmsmusings.blogspot.com/2011/12/replication-and-latency-consistency.html > Henry Robinson (Cloudera) also blogged about PBS: > http://the-paper-trail.org/blog/?p=334 -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com