A cheeky plug for the project I’m currently working on: k8ssandra-operator 
<https://github.com/k8ssandra/k8ssandra-operator> aims to automate a lot of 
everyday Cassandra maintenance tasks by orchestrating Cassandra on Kubernetes. 
We offer integration with the TLP projects Reaper (repair), Medusa (backup + 
restore), as well as MCAC (monitoring via Prometheus and Grafana). We also 
offer support for the essentials - upgrades, scaling, automated health checking 
and restarts of failed instances.

We hit 1.0.0 12 days ago, so the project is very dynamic and is under active 
development. We’re still figuring a lot of things out, but - if we do our job 
right - we think that k8ssandra-operator will become the preferred way to 
manage Cassandra in modern environments over time.

https://github.com/k8ssandra/k8ssandra-operator 
<https://github.com/k8ssandra/k8ssandra-operator> 


> On 2 Mar 2022, at 6:22 am, Jeff Jirsa <jji...@gmail.com> wrote:
> 
> Most teams are either using things like ansible/python scripts, or have 
> bespoke infrastructure. 
> 
> Some of what you're describing is included in the intent of the 
> `cassandra-sidecar` project: 
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652224 
> <https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652224>
> 
> ====
> Goals
> We target two main goals for the first version of the sidecar, both work 
> towards having a easy to use control plane for managing Cassandra’s data 
> plane.
> Provide an extensible and pluggable architecture for developers and operators 
> to easily operate Cassandra as well as easing integration with their existing 
> infrastructure. One major sub-goal of this goal is:
> The proposal should pass the “curl test”: meaning that it is accessible to 
> standard tooling and out of the box libraries available for practically every 
> environment or programming language (including python, ruby, bash). This 
> means that as a public interface we cannot choose Java specific (jmx) or 
> Cassandra specific (CQL) APIs.
> Provide basic but essential and useful functionality. Some proposed scope in 
> this document:
> Run health checks on replicas and the cluster
> Run diagnostic commands on individual nodes as well as all nodes in the 
> cluster (bulk commands)
> Export metrics via pluggable agents rather than polling JMX
> Schedule periodic management activities such as running clean ups
> (as a stretch goal) safely restart all nodes in the cluster.
> ====
> 
> The health checker seems to be implemented, I'm not sure if the coordinated 
> cleanup or similar exist yet (or if there are JIRAs around for them). In 
> theory, this type of work - outside the database, in automation - should be 
> really easy for newcomers who are solving their own problems. 
> 
> Other things that sorta fall into this space, but may be not quite what 
> you're looking for:
> 
> - https://github.com/Netflix/Priam <https://github.com/Netflix/Priam> (if you 
> run very much like netflix runs, especially on AWS)
> - https://github.com/thelastpickle/cassandra-reaper 
> <https://github.com/thelastpickle/cassandra-reaper> for the repair automation
> - https://github.com/JeremyGrosser/tablesnap 
> <https://github.com/JeremyGrosser/tablesnap> (old-ish, for backups)
> 
> 
> 
> On Tue, Mar 1, 2022 at 11:05 AM Joe Obernberger <joseph.obernber...@gmail.com 
> <mailto:joseph.obernber...@gmail.com>> wrote:
> Thanks all - I'll take a look at Ansible.  Back in my Hadoop days, we 
> would use Cloudera manager (course that now costs $). Sounds like we 
> need a new open source project!  :)
> 
> -Joe
> 
> On 3/1/2022 7:46 AM, Bowen Song wrote:
> > We use Ansible to manage a fairly large (200+ nodes) cluster. We 
> > created our own Ansible playbooks for common tasks, such as rolling 
> > restart. We also use Cassandra Reaper for scheduling and running 
> > repairs on the same cluster. We occasionally also use pssh (parallel 
> > SSH) for inspecting the logs or configurations on selected nodes. 
> > Running pssh on very larger number of servers is obviously not 
> > practical due the the available screen space constraint.
> >
> > On 28/02/2022 21:59, Joe Obernberger wrote:
> >> Hi all - curious what tools are folks using to manage large Cassandra 
> >> clusters?  For example, to do tasks such as nodetool cleanup after a 
> >> node or nodes are added to the cluster, or simply rolling start/stops 
> >> after an update to the config or a new version?
> >> We've used puppet before; is that what other folks are using?
> >> Thanks for any suggestions.
> >>
> >> -Joe
> >>
> >

Reply via email to