Happy holidays all, I imagine most people are about to disappear to celebrate holidays, so I wanted to try to summarize the state of Cassandra dev for 2017, as I see it. Standard disclaimers apply (this is my personal opinion, not that of my employer, not officially endorsed by the Apache Cassandra PMC, or the ASF).
Some quick stats about Cassandra development efforts in 2017 (using imperfect git log | awk/sed counting, only looking at trunk, buyer beware, it's probably off by a few): The first commit of 2017 was: Ben Manes, transforming the on-heap cache to Caffeine ( https://github.com/apache/cassandra/commit/c607d76413be81a0e125c5780e068d7ab7594612 ) Alex Petrov removed the most code (~7500 lines, according to github) Benjamin Lerer added the most code (~8000 lines, according to github) We put to bed the tick/tock release cycle, but still cut 14 different releases across 5 different branches. We had a total of 136 different contributors, with 48 of those contributors contributing more than one patch during the year. We had a total of 47 different reviewers There were 661 non-merge commits to trunk There were 56 non-merge commits to docs/ We end the year with roughly 173 pending changes for 4.0 We resolved (either fixed or disqualified) 781 issues in JIRA I count something like 273 email threads to dev@, and 903 email threads to user@ The project added Stefan Podkowinski, Joel Knighton, Ariel Weisberg, Alex Petrov, Blake Eggleston, and Philip Thompson as committers. The project added Josh McKenzie, Marcus Eriksson and Jon Haddad to the Apache Cassandra PMC At NGCC (which Eric and Gary managed to organize with the help of Instaclustr sponsoring, an achievement in itself), we had people talk about: - Two different talks (from Apple and FB/Instagram). I'm struggling to describe these in simple terms, they both sorta involving using hints and changing some of the consistency concepts to help deal with latency / durability / availability, especially in cross-DC workloads. Grouping these together isn't really fair, but no one-email summary is going to be fair to either of these talks. If you missed NGCC, I guess you get to wait for the JIRAs / patches. - A new storage engine (FB/Instagram) using RocksDB - Some notes on using CDC at scale (and some proposed changes to make it easier) from Uber ( https://github.com/ngcc/ngcc2017/blob/master/CassandraDataIngestion.pdf ) - Michael Shuler (Datastax / Cassandra PMC / release master / etc) spent some time talking about testing and CI. Some other big'ish development efforts worth mentioning (from personal memory, perhaps the worst possible way to create such a list): - We spent a fair amount of time talking about testing. Francois @ Instagram lead the way in codifying a new set of principles around testing and quality ( https://lists.apache.org/thread.html/0854341ae3ab41ceed2ae8a03f2486cf2325e4fca6fd800bf4297dd4@%3Cdev.cassandra.apache.org%3E / https://issues.apache.org/jira/browse/CASSANDRA-13497 ). - We've also spent some time making tests work in CircleCI, which should make life much easier for occasional contributors - no need to figure out how to run tests in ASF Jenkins. - The internode messaging rewrite to use async/netty is probably the single largest that comes to mind. It went in earlier this year, and should make it easier to have HUGE clusters. All of you running thousand instance clusters will probably benefit from this patch (I know you're out there, I've talked to you in IRC) - will be in 4.0 ( https://issues.apache.org/jira/browse/CASSANDRA-8457 ) - We have a company working on making Cassandra happy with proprietary flash storage and PPC64LE (IBM's recent patches, https://developer.ibm.com/linuxonpower/2017/03/31/using-capi-improve-performance-apache-cassandra-work-progress-update/ ) - We have a new commitlog mode added for the first time in quite some time - the GroupCommitLog will be in 4.0 ( https://issues.apache.org/jira/browse/CASSANDRA-13530 ) - Michael Kjellman spent some time porting dtests from nose to pytest, and from python 2.7 to python 3, removing dependencies on dead projects like pycassa and the old thrift-cql library. Still needs to be reviewed ( https://issues.apache.org/jira/browse/CASSANDRA-14134 ) - Robert Stupp spent some time porting to java9 - again, still need to be reviewed ( https://issues.apache.org/jira/browse/CASSANDRA-9608 ) Overall, the state of the project appears to be strong. We're seeing active contributions driven primarily by users (like you), the 8099/3.0 engine is looking pretty good here in December, and the code base is stabilizing towards a product all of us should be happy to run in production. Despite some irrationally skeptical sky-is-falling threads near the end of 2016, I feel confident in saying it was a pretty good year for Cassandra, and as the project continues to move forward, I'm looking forward to seeing 4.0 launch in 2018 (hopefully with a real user conference!) - Jeff