What operator are you all using? We've just been using statefulsets for
our clusters. I'm a big time on-hardware fan, but an issue with
Cassandra is the notion of one JVM per about 1 to 2TBytes of disk
space. Most large servers are in the 256 core+ / 100+TBytes of disk.
Managing that many instances of cassandra on a single node is painful.
Kubernetes solves that. Want to scale up?
kubectl scale statefulset --n cassandra cassandra -replicas=48
or whatever. Doing a rolling restart is easy. On 'large' deployments
of over 500TBytes of disk - I'm not sure how this can be easily managed
without kubernetes. How is it done?
As to persistent storage; yes, I miss the days of Hadoop and HDFS! But
here we are....most things seem to be going down the path of storage is
network device, and no longer local. I don't like it either, but there
are certainly management and ease-of-use considerations.
-Joe
On 6/12/2025 10:15 AM, Jon Haddad wrote:
I agree that managing Cassandra on Kubernetes can be challenging
without prior experience, as understanding all the nuances of
Kubernetes takes time.
However, there are ways to address the rescheduling issues, node
placement, and local disk concerns that were mentioned. You can pin
pods to specific hosts to avoid rescheduling on different nodes, and
you can use local disks or a combination of persistent disks with a
local NVMe as a cache. Host networking or (i think) Cillium can help
with the networking performance concerns. For most arguments against
using Kubernetes, there's usually a workaround or setting that can
address the issue.
The main advantage of Kubernetes is the operator. While it has some
quirks, it generally does a good job of managing your deployment,
eliminating the need to write all your workflows. Building on
Kubernetes as a standard offers the advantage of applying your
knowledge across various environments once you're familiar with it.
I wouldn't recommend jumping into Kubernetes and Cassandra
simultaneously. Both are complex topics. I've worked with Cassandra
for over a decade and Kubernetes on and off for five years, and I
still encounter challenges, especially when my desired outcome differs
from the operator's.
Both versions are workable. Both have tradeoffs. For now, I'm also
sticking to baking AMIs [3], but with more experience on K8 and a
little more maturity from Cassandra, I'd think differently. For
stateless apps, I'm 100% on board with K8.
Jon
[1]
https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-36%3A+A+Configurable+ChannelProxy+to+alias+external+storage+locations
[2] https://lists.apache.org/thread/r0nhyyn6mbpy55fl90xqcj17v6w3wxg3
[3] https://github.com/rustyrazorblade/easy-cass-lab/tree/main/packer
On Thu, Jun 12, 2025 at 6:17 AM Luciano Greiner
<luciano.grei...@gmail.com> wrote:
Quick correction on my previous message — I assumed you were referring
to running Cassandra on Kubernetes, not purely ECS.
Many of the same concerns still apply. ECS tasks can also be
rescheduled or moved between instances, which poses risks for
Cassandra’s rack awareness and replica distribution. Ensuring stable
node identity and local storage is still tricky.
Cassandra works best when it's tightly coupled to its hardware —
ideally on dedicated VMs or bare metal — where you have full control
over topology and disk performance.
Luciano Greiner
On Thu, Jun 12, 2025 at 10:13 AM Luciano Greiner
<luciano.grei...@gmail.com> wrote:
>
> I usually advise against running Cassandra (or most databases)
inside
> Kubernetes. It might look like it simplifies operations, but in my
> experience, it tends to introduce more complexity than it solves.
>
> With Cassandra specifically, Kubernetes may reschedule pods for
> reasons outside your control (e.g., node pressure, restarts,
> upgrades). This can lead to topology violations — for example, all
> replicas ending up in the same physical rack, defeating the
purpose of
> proper rack and datacenter awareness.
>
> Another major issue is storage. Cassandra expects fast, local disks
> close to the compute layer. While Kubernetes StatefulSets can use
> PersistentVolumes, these are often network-attached and may not
offer
> the performance or locality guarantees Cassandra needs. And if your
> pods get rescheduled, depending on your storage class and cloud
> provider, you may run into delays or errors reattaching volumes.
>
> Using an operator like K8ssandra doesn't necessarily eliminate these
> problems — it just adds another tool to manage within the puzzle.
>
> Luciano Greiner
>
> On Thu, Jun 12, 2025 at 6:20 AM Dor Laor via user
> <user@cassandra.apache.org> wrote:
> >
> > It's possible to manage Cassandra well both with VMs and
containers.
> > As you'd be running one container per VM, there is no
significant advantage for
> > containers. K8s provides nice tooling and some methodological
enforcement which
> > brings order to the setup but if the team aren't top notch
experts in k8s, it's not worth
> > the trouble and the limitations that come with it (networking
outside the k8s cluster, etc).
> > It's good to have fewer layers. Most users run databases
outside of containers.
> >
> > On Wed, Jun 11, 2025 at 11:36 PM Raymond Yu
<rayyu...@gmail.com> wrote:
> >>
> >> Hi Cassandra community,
> >>
> >> I would like to ask for your expert opinions regarding a
discussion we're having about deploying Cassandra on AWS EC2 vs.
AWS ECS. For context, we have a small dedicated DB engineering
team that is familiar with operating and supporting Cassandra on
EC2 for many customer teams. However, one team has developed
custom tooling for operating Cassandra on ECS (EC2-backed) and
would like for us to migrate to it for their Cassandra needs,
which has spawned this discussion (K8ssandra was considered, but
that team did not want to use Kubernetes).
> >>
> >> Further context on our team and experience:
> >> - Small dedicated team supporting Cassandra (and other DBs)
> >> - Familiar with operating EC2 on Cassandra
> >> - Familiar with standard IaC tools and languages
(Ansible/Terraform/Python/etc.)
> >> - Only deploy in AWS
> >>
> >> Discussed points regarding staying with EC2:
> >> - Existing team experience and automation in deploying
Cassandra on EC2
> >> - Simpler solution is easier to support and maintain
> >> - Almost all documentation we can find and use is specific to
deploying on EC2
> >> - Third party support is familiar with EC2 by default
> >> - Lower learning curve is lower for engineers to onboard
> >> - More hands-on maintenance regarding OS upgrades
> >> - Less modern solution
> >>
> >> Discussed points regarding using the new ECS solution:
> >> - Containers are the more modern solution
> >> - Node autoheal feature in addition to standard C* operations
via a control plane
> >> - Higher tool and architecture complexity that requires
ramp-up in order to use and support effectively
> >> - We're on our own for potential issues with the tool itself
after it would be handed off
> >> - No demonstrated performance gain over EC2-based clusters
> >> - Third-party support would be less familiar with dealing
with ECS issues
> >> - Deployed on EC2 under the hood (one container per VM), so
the underlying architecture is the same between both solutions
> >>
> >> Given that context, our team generally feels that there is
little marginal benefit given the cost of ramp up and supporting a
custom tool, but there has also been a request for harder evidence
and outside opinions on the topic. It has been hard to find
documentation of this specific comparison on EC2 vs ECS to
reference. We'd love to hear your thoughts on our context, but
also are interested in any general recommendations for one over
the other. Thanks in advance!
> >>
> >> Best,
> >> Raymond Yu
--
This email has been checked for viruses by AVG antivirus software.
www.avg.com