A cheeky plug for the project I’m currently working on: k8ssandra-operator <https://github.com/k8ssandra/k8ssandra-operator> aims to automate a lot of everyday Cassandra maintenance tasks by orchestrating Cassandra on Kubernetes. We offer integration with the TLP projects Reaper (repair), Medusa (backup + restore), as well as MCAC (monitoring via Prometheus and Grafana). We also offer support for the essentials - upgrades, scaling, automated health checking and restarts of failed instances.
We hit 1.0.0 12 days ago, so the project is very dynamic and is under active development. We’re still figuring a lot of things out, but - if we do our job right - we think that k8ssandra-operator will become the preferred way to manage Cassandra in modern environments over time. https://github.com/k8ssandra/k8ssandra-operator <https://github.com/k8ssandra/k8ssandra-operator> > On 2 Mar 2022, at 6:22 am, Jeff Jirsa <jji...@gmail.com> wrote: > > Most teams are either using things like ansible/python scripts, or have > bespoke infrastructure. > > Some of what you're describing is included in the intent of the > `cassandra-sidecar` project: > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652224 > <https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652224> > > ==== > Goals > We target two main goals for the first version of the sidecar, both work > towards having a easy to use control plane for managing Cassandra’s data > plane. > Provide an extensible and pluggable architecture for developers and operators > to easily operate Cassandra as well as easing integration with their existing > infrastructure. One major sub-goal of this goal is: > The proposal should pass the “curl test”: meaning that it is accessible to > standard tooling and out of the box libraries available for practically every > environment or programming language (including python, ruby, bash). This > means that as a public interface we cannot choose Java specific (jmx) or > Cassandra specific (CQL) APIs. > Provide basic but essential and useful functionality. Some proposed scope in > this document: > Run health checks on replicas and the cluster > Run diagnostic commands on individual nodes as well as all nodes in the > cluster (bulk commands) > Export metrics via pluggable agents rather than polling JMX > Schedule periodic management activities such as running clean ups > (as a stretch goal) safely restart all nodes in the cluster. > ==== > > The health checker seems to be implemented, I'm not sure if the coordinated > cleanup or similar exist yet (or if there are JIRAs around for them). In > theory, this type of work - outside the database, in automation - should be > really easy for newcomers who are solving their own problems. > > Other things that sorta fall into this space, but may be not quite what > you're looking for: > > - https://github.com/Netflix/Priam <https://github.com/Netflix/Priam> (if you > run very much like netflix runs, especially on AWS) > - https://github.com/thelastpickle/cassandra-reaper > <https://github.com/thelastpickle/cassandra-reaper> for the repair automation > - https://github.com/JeremyGrosser/tablesnap > <https://github.com/JeremyGrosser/tablesnap> (old-ish, for backups) > > > > On Tue, Mar 1, 2022 at 11:05 AM Joe Obernberger <joseph.obernber...@gmail.com > <mailto:joseph.obernber...@gmail.com>> wrote: > Thanks all - I'll take a look at Ansible. Back in my Hadoop days, we > would use Cloudera manager (course that now costs $). Sounds like we > need a new open source project! :) > > -Joe > > On 3/1/2022 7:46 AM, Bowen Song wrote: > > We use Ansible to manage a fairly large (200+ nodes) cluster. We > > created our own Ansible playbooks for common tasks, such as rolling > > restart. We also use Cassandra Reaper for scheduling and running > > repairs on the same cluster. We occasionally also use pssh (parallel > > SSH) for inspecting the logs or configurations on selected nodes. > > Running pssh on very larger number of servers is obviously not > > practical due the the available screen space constraint. > > > > On 28/02/2022 21:59, Joe Obernberger wrote: > >> Hi all - curious what tools are folks using to manage large Cassandra > >> clusters? For example, to do tasks such as nodetool cleanup after a > >> node or nodes are added to the cluster, or simply rolling start/stops > >> after an update to the config or a new version? > >> We've used puppet before; is that what other folks are using? > >> Thanks for any suggestions. > >> > >> -Joe > >> > >