Hi Haryadi, Personally I'd love to see your approach extended to test up to 10K nodes, or so.
There are not too many known instances of scaling past 1000 nodes, and as the need for scale grows, and as scale out hardware becomes more commonplace (high density, but with lots of small servers...aka hp moonshot, blade servers, etc), 10K nodes is the next frontier. Would be great to demonstrate that your tool can find *new* bugs and limitations (which it certainly would at that scale), as opposed to just reproducing existing ones. One other thought is to test with both non-vnodes and vnodes (and maybe multiple number of vnodes per node) at extreme scales like that to get a sense of what kind of overhead vnodes adds to the current gossip implementation at scale. Regarding existing bugs that you might usefully reproduce, I'll leave that to others. Thanks. -Tupshin On Fri, Apr 8, 2016, at 09:57 PM, Haryadi Gunawi wrote: > Hi Jonathan, > > Thanks for the reply! > > We don't need a patched version of Cassandra. Specifically, this is > what > we'd like to get help from you if possible: > > Cassandra devs: "Here are recent JIRA entries that discuss > scale-dependent > bugs: CASSANDRA-X, -Y, -Z (where XYZ are JIRA bug#)" > > Our side: We will study the bug discussions, download the affected > Cassandra version (as mentioned in the JIRA), integrate that specific > version with our framework, and reproduce the bug in one machine. > > Basically, we're interested to know if there are still unresolved or > newly-resolved bugs (2015-2016) in Cassandra JIRA that we could use to > test > our approach. (The bugs in our previous email are relatively old). > > > We're targeting a publication deadline one month from now. It'd be > lovely > if we get more sample bugs. After the deadline, we'd be happy to send > you > the draft of the paper. > > Please do let us know if you have any other questions. > Thanks! > -- Har > > > > On Fri, Apr 8, 2016 at 8:03 PM, Jonathan Ellis <jbel...@gmail.com> wrote: > > > Sounds very interesting! We'd love to hear more about your approach. In > > particular, does it require a patched version of Cassandra? > > > > On Thu, Apr 7, 2016 at 6:18 PM, Tanakorn Leesatapornwongsa < > > tanak...@cs.uchicago.edu> wrote: > > > >> Dear Cassandra development team, > >> > >> We are computer science researchers at the University of Chicago. Our > >> research is about the reliability of cloud-scale distributed systems. > >> Samples of our work can be found here: http://ucare.cs.uchicago.edu < > >> http://ucare.cs.uchicago.edu/> > >> > >> We are reaching out to you because we are interested in reproducing any > >> unsolved scalability bugs in Cassandra. > >> > >> We define scalability bugs as latent bugs that are scale-dependent. They > >> don't arise in small-scale deployment but arise in large-scale production > >> runs. For example, everything is fine in 100-node deployment but in > >> 500-node deployment the bug appears. > >> > >> We have created a scale-check methodology (SLCK) that can unearth > >> scalability bugs in a single machine. With SLCK, we can run hundreds of > >> nodes on a single machine and reproduce some old scalability bugs. For > >> example, we have reproduced the following bugs in one machine: > >> > >> - https://issues.apache.org/jira/browse/CASSANDRA-6127 < > >> https://issues.apache.org/jira/browse/CASSANDRA-6127> (a customer > >> observed node flapping when bootstrapping 1000 nodes) > >> > >> - https://issues.apache.org/jira/browse/CASSANDRA-3831 < > >> https://issues.apache.org/jira/browse/CASSANDRA-3831> > >> > >> We are submitting SLCK for publication soon, and we can send you a draft > >> a month from now if you are interested. > >> > >> To make a stronger publication submission, beyond reproducing old bugs, > >> we thought it would be great if SLCK can reproduce new scalability bugs (if > >> any) that you are still trying to resolve. > >> > >> We hope you find our work interesting and we would really appreciate if > >> you can point to us any new scalability bugs that hopefully we can help you > >> reproduce. > >> > >> Thank you very much for your attention! > >> > >> Best, > >> Tanakorn L. > >> > >> > >> > >> > > > > > > -- > > Jonathan Ellis > > Project Chair, Apache Cassandra > > co-founder, http://www.datastax.com > > @spyced > >