Research on scalability bug finder for Cassandra

Tanakorn Leesatapornwongsa Fri, 08 Apr 2016 09:40:19 -0700

Dear Cassandra development team,

We are computer science researchers at the University of Chicago.  Our research 
is about the reliability of cloud-scale distributed systems. Samples of our 
work can be found here: http://ucare.cs.uchicago.edu 
<http://ucare.cs.uchicago.edu/>


We are reaching out to you because we are interested in reproducing any 
unsolved scalability bugs in Cassandra.

We define scalability bugs as latent bugs that are scale-dependent.  They don't 
arise in small-scale deployment but arise in large-scale production runs.  For 
example, everything is fine in 100-node deployment but in 500-node deployment 
the bug appears.

We have created a scale-check methodology (SLCK) that can unearth scalability 
bugs in a single machine.  With SLCK, we can run hundreds of nodes on a single 
machine and reproduce some old scalability bugs. For example, we have 
reproduced the following bugs in one machine:

- https://issues.apache.org/jira/browse/CASSANDRA-6127 
<https://issues.apache.org/jira/browse/CASSANDRA-6127>   (a customer observed 
node flapping when bootstrapping 1000 nodes)

- https://issues.apache.org/jira/browse/CASSANDRA-3831 
<https://issues.apache.org/jira/browse/CASSANDRA-3831>

We are submitting SLCK for publication soon, and we can send you a draft a 
month from now if you are interested.

To make a stronger publication submission, beyond reproducing old bugs, we 
thought it would be great if SLCK can reproduce new scalability bugs (if any) 
that you are still trying to resolve.

We hope you find our work interesting and we would really appreciate if you can 
point to us any new scalability bugs that hopefully we can help you reproduce.

Thank you very much for your attention!

Best,
Tanakorn L.

Research on scalability bug finder for Cassandra

Reply via email to