Kubernetes Operator: Can We Preserve CassKop's Flexibility?

Tom Offermann Wed, 07 Oct 2020 11:24:15 -0700

I've been following the discussion about Kubernetes operators with a great
deal of interest. At New Relic, we're about to move our Cassandra Clusters
from bare-metal hosts in our datacenters to Kubernetes clusters in AWS, so
we've been looking closely at the current operators.

Our goals:

* Don't write our own operator.

* Choose the community standard, if possible. If not possible, choose an
operator with active development, usage, and community.

* Choose an operator that can work with our existing way of managing
clusters. Most significantly, at New Relic we do not use virtual nodes in
our Cassandra clusters. Instead, we continue to assign initial_tokens to
individual nodes. While we certainly don't expect an operator to support
this use case by default, we do hope that an operator will make it
possible.

* Don't run a forked version of the operator.

Both [cass-operator][1] and [CassKop][2] worked very well and we were
really impressed with both of them. Heading into the evaluation, we
expected to choose Datastax's cass-operator. Given Datastax's position in
the Cassandra community, and given that they wrote the most widely-used
Cassandra clients, they seemed like they would be in the best position to
provide the community standard.

We ended up choosing CassKop.

However, I don't want this to email to be viewed as lobbying for choosing
one operator over another. I'm excited about the possibility that's
currently being discussed of merging development efforts and incorporating
CassKop features into cass-operator.

I do want to highlight some of the advantages that CassKop currently offers
for our use case, in the hope that we can preserve those advantages going
forward. (Or, even improve them!)

1. CassKop offers a huge amount of flexibility for modifying Cassandra
configuration files. If needed, you can swap in your own [bootstrap][3]
docker image to manipulate the Cassandra configuration files, but
oftentimes you don't even need to do that. Since CassKop offers the ability
to define a pre_run.sh script that will run in the bootstrap container, you
can get pretty far with some shell scripting. In our pre_run.sh, we do
per-pod configuration to assign initial token values.

We didn't see an easy way to perform per-pod configuration with
cass-operator. There is no equivalent pre_run.sh hook in
[cass-config-builder][4], which is the init container in cass-operator
that's comparable to CassKop's bootstrap container.

2. CassKop is less opinionated about which Cassandra version you want to
run. My understanding is that cass-config-builder adds a layer of
abstraction so that it will produce configuration that is tailored to
certain versions of open-source and DSE Cassandra. Which works great,
unless you want to run a version of Cassandra that isn't supported. We were
surprised to see that cass-operator only works with a [handful of Cassandra
versions][5].

There didn't seem to be an easy way to use cass-operator with an earlier
version of Cassandra than those that are officially supported.

3. CassKop requires adoption of fewer, less-complex components. CassKop's
bootstrap container was easier for us to wrap our heads around than
cass-config-builder. In addition, using cass-operator also required the
usage of the [management-api][6] sidecar. This means that the adoption of a
new operator also required the adoption of a new sidecar as well. Perhaps
this is overstated, but it felt like choosing cass-operator required
embracing a whole ecosystem, rather than simply an operator.

Now, if the management-api sidecar was widely used throughout the
community, then I wouldn't feel the same reluctance to use it. Knowing that
it was going to be the community standard moving forward would be a big
help. But, until it achieves that role as the standard, then choosing
cass-operator means choosing both an operator and a sidecar, when there's
no guarantee that either of them will become the standard. It's a bigger
commitment.

I realize that the concerns we have when choosing an operator may not be
shared by all. I raise these points with the hope that we can keep them in
mind. It's possible to build flexibility into a Cassandra operator, so that
it can be used in ways that deviate from the default, or even used in ways
that the original authors didn't anticipate.

I do want to thank both Orange and Datastax for all of the work they've put
into their operators, as well as everyone here discussing the best way to
move forward. We are super appreciative and I'm optimistic that some of us
at New Relic will be in a position soon to be able to contribute to these
efforts.

Thanks,
Tom

[1]: https://github.com/datastax/cass-operator
[2]: https://github.com/Orange-OpenSource/casskop
[3]:
https://github.com/Orange-OpenSource/casskop/tree/master/docker/bootstrap
[4]: https://github.com/datastax/cass-config-builder
[5]:
https://github.com/datastax/cass-operator/blob/master/operator/deploy/crds/cassandra.datastax.com_cassandradatacenters_crd.yaml#L6029-L6040
[6]: https://github.com/datastax/management-api-for-apache-cassandra

--
Tom Offermann
Lead Software Engineer
http://newrelic.com

Kubernetes Operator: Can We Preserve CassKop's Flexibility?

Reply via email to