I've been following the discussion about Kubernetes operators with a great deal of interest. At New Relic, we're about to move our Cassandra Clusters from bare-metal hosts in our datacenters to Kubernetes clusters in AWS, so we've been looking closely at the current operators.
Our goals: * Don't write our own operator. * Choose the community standard, if possible. If not possible, choose an operator with active development, usage, and community. * Choose an operator that can work with our existing way of managing clusters. Most significantly, at New Relic we do not use virtual nodes in our Cassandra clusters. Instead, we continue to assign initial_tokens to individual nodes. While we certainly don't expect an operator to support this use case by default, we do hope that an operator will make it possible. * Don't run a forked version of the operator. Both [cass-operator][1] and [CassKop][2] worked very well and we were really impressed with both of them. Heading into the evaluation, we expected to choose Datastax's cass-operator. Given Datastax's position in the Cassandra community, and given that they wrote the most widely-used Cassandra clients, they seemed like they would be in the best position to provide the community standard. We ended up choosing CassKop. However, I don't want this to email to be viewed as lobbying for choosing one operator over another. I'm excited about the possibility that's currently being discussed of merging development efforts and incorporating CassKop features into cass-operator. I do want to highlight some of the advantages that CassKop currently offers for our use case, in the hope that we can preserve those advantages going forward. (Or, even improve them!) 1. CassKop offers a huge amount of flexibility for modifying Cassandra configuration files. If needed, you can swap in your own [bootstrap][3] docker image to manipulate the Cassandra configuration files, but oftentimes you don't even need to do that. Since CassKop offers the ability to define a pre_run.sh script that will run in the bootstrap container, you can get pretty far with some shell scripting. In our pre_run.sh, we do per-pod configuration to assign initial token values. We didn't see an easy way to perform per-pod configuration with cass-operator. There is no equivalent pre_run.sh hook in [cass-config-builder][4], which is the init container in cass-operator that's comparable to CassKop's bootstrap container. 2. CassKop is less opinionated about which Cassandra version you want to run. My understanding is that cass-config-builder adds a layer of abstraction so that it will produce configuration that is tailored to certain versions of open-source and DSE Cassandra. Which works great, unless you want to run a version of Cassandra that isn't supported. We were surprised to see that cass-operator only works with a [handful of Cassandra versions][5]. There didn't seem to be an easy way to use cass-operator with an earlier version of Cassandra than those that are officially supported. 3. CassKop requires adoption of fewer, less-complex components. CassKop's bootstrap container was easier for us to wrap our heads around than cass-config-builder. In addition, using cass-operator also required the usage of the [management-api][6] sidecar. This means that the adoption of a new operator also required the adoption of a new sidecar as well. Perhaps this is overstated, but it felt like choosing cass-operator required embracing a whole ecosystem, rather than simply an operator. Now, if the management-api sidecar was widely used throughout the community, then I wouldn't feel the same reluctance to use it. Knowing that it was going to be the community standard moving forward would be a big help. But, until it achieves that role as the standard, then choosing cass-operator means choosing both an operator and a sidecar, when there's no guarantee that either of them will become the standard. It's a bigger commitment. I realize that the concerns we have when choosing an operator may not be shared by all. I raise these points with the hope that we can keep them in mind. It's possible to build flexibility into a Cassandra operator, so that it can be used in ways that deviate from the default, or even used in ways that the original authors didn't anticipate. I do want to thank both Orange and Datastax for all of the work they've put into their operators, as well as everyone here discussing the best way to move forward. We are super appreciative and I'm optimistic that some of us at New Relic will be in a position soon to be able to contribute to these efforts. Thanks, Tom [1]: https://github.com/datastax/cass-operator [2]: https://github.com/Orange-OpenSource/casskop [3]: https://github.com/Orange-OpenSource/casskop/tree/master/docker/bootstrap [4]: https://github.com/datastax/cass-config-builder [5]: https://github.com/datastax/cass-operator/blob/master/operator/deploy/crds/cassandra.datastax.com_cassandradatacenters_crd.yaml#L6029-L6040 [6]: https://github.com/datastax/management-api-for-apache-cassandra -- Tom Offermann Lead Software Engineer http://newrelic.com