Happy holidays all,

I imagine most people are about to disappear to celebrate holidays, so I
wanted to try to summarize the state of Cassandra dev for 2017, as I see
it. Standard disclaimers apply (this is my personal opinion, not that of my
employer, not officially endorsed by the Apache Cassandra PMC, or the ASF).

Some quick stats about Cassandra development efforts in 2017 (using
imperfect git log | awk/sed counting, only looking at trunk, buyer beware,
it's probably off by a few):

The first commit of 2017 was: Ben Manes, transforming the on-heap cache to
Caffeine (
https://github.com/apache/cassandra/commit/c607d76413be81a0e125c5780e068d7ab7594612
)
Alex Petrov removed the most code (~7500 lines, according to github)
Benjamin Lerer added the most code (~8000 lines, according to github)
We put to bed the tick/tock release cycle, but still cut 14 different
releases across 5 different branches.
We had a total of 136 different contributors, with 48 of those contributors
contributing more than one patch during the year.
We had a total of 47 different reviewers
There were 661 non-merge commits to trunk
There were 56 non-merge commits to docs/
We end the year with roughly 173 pending changes for 4.0
We resolved (either fixed or disqualified) 781 issues in JIRA
I count something like 273 email threads to dev@, and 903 email threads to
user@
The project added Stefan Podkowinski, Joel Knighton, Ariel Weisberg, Alex
Petrov, Blake Eggleston, and Philip Thompson as committers.
The project added Josh McKenzie, Marcus Eriksson and Jon Haddad to the
Apache Cassandra PMC

At NGCC (which Eric and Gary managed to organize with the help of
Instaclustr sponsoring, an achievement in itself), we had people talk about:
- Two different talks (from Apple and FB/Instagram). I'm struggling to
describe these in simple terms, they both sorta involving using hints and
changing some of the consistency concepts to help deal with latency /
durability / availability, especially in cross-DC workloads. Grouping these
together isn't really fair, but no one-email summary is going to be fair to
either of these talks. If you missed NGCC, I guess you get to wait for the
JIRAs / patches.
- A new storage engine (FB/Instagram) using RocksDB
- Some notes on using CDC at scale (and some proposed changes to make it
easier) from Uber (
https://github.com/ngcc/ngcc2017/blob/master/CassandraDataIngestion.pdf )
- Michael Shuler (Datastax /  Cassandra PMC / release master / etc) spent
some time talking about testing and CI.

Some other big'ish development efforts worth mentioning (from personal
memory, perhaps the worst possible way to create such a list):
- We spent a fair amount of time talking about testing. Francois @
Instagram lead the way in codifying a new set of principles around testing
and quality (
https://lists.apache.org/thread.html/0854341ae3ab41ceed2ae8a03f2486cf2325e4fca6fd800bf4297dd4@%3Cdev.cassandra.apache.org%3E
/ https://issues.apache.org/jira/browse/CASSANDRA-13497 ).
- We've also spent some time making tests work in CircleCI, which should
make life much easier for occasional contributors - no need to figure out
how to run tests in ASF Jenkins.
- The internode messaging rewrite to use async/netty is probably the single
largest that comes to mind. It went in earlier this year, and should make
it easier to have HUGE clusters. All of you running thousand instance
clusters will probably benefit from this patch (I know you're out there,
I've talked to you in IRC) - will be in 4.0 (
https://issues.apache.org/jira/browse/CASSANDRA-8457 )
- We have a company working on making Cassandra happy with proprietary
flash storage and PPC64LE (IBM's recent patches,
https://developer.ibm.com/linuxonpower/2017/03/31/using-capi-improve-performance-apache-cassandra-work-progress-update/
)
- We have a new commitlog mode added for the first time in quite some time
- the GroupCommitLog will be in 4.0 (
https://issues.apache.org/jira/browse/CASSANDRA-13530 )
- Michael Kjellman spent some time porting dtests from nose to pytest, and
from python 2.7 to python 3, removing dependencies on dead projects like
pycassa and the old thrift-cql library. Still needs to be reviewed (
https://issues.apache.org/jira/browse/CASSANDRA-14134 )
- Robert Stupp spent some time porting to java9 - again, still need to be
reviewed ( https://issues.apache.org/jira/browse/CASSANDRA-9608 )

Overall, the state of the project appears to be strong. We're seeing active
contributions driven primarily by users (like you), the 8099/3.0 engine is
looking pretty good here in December, and the code base is stabilizing
towards a product all of us should be happy to run in production. Despite
some irrationally skeptical sky-is-falling threads near the end of 2016, I
feel confident in saying it was a pretty good year for Cassandra, and as
the project continues to move forward, I'm looking forward to seeing 4.0
launch in 2018 (hopefully with a real user conference!)

- Jeff

Reply via email to