Hi Stephane,
How did you configure your cluster? Have you been using cephadm? If not,
I really advise you to recreate your cluster with cephadm, that includes
a script to bootstrap the cluster. In particular if you don't have a
detail knowledge about Ceph architecture and management, it will ensure
that your cluster is properly configured and let you progressively learn
about Ceph details...
Best regards.
Michel
Le 21/07/2025 à 09:02, Stéphane Barthes a écrit :
Hello,
I am very new to ceph and have started a small cluster to get started
with ceph.
But so far my experience, is not very impressive, probably by lack of
knowledge and good practices.
I started with Ubuntu 24, installed 3 VM for a ceph cluster, and some
how could not get it running. Adding nodes would fail adding OSDs with
some weird error(I found it on the web but could not solve the problem).
I then made a new cluster with 3 ubuntu 22 VM. Install ok, start ok, I
created 1 pool to test storing stuff there and work my way across
crash testing. However the cluster dies during the weekly vm snapshot.
It may not a good idea to run vm backups on a ceph host, but I find
this a little surprising. (crash testing started earlier than expected)
Bottom line is that, after the backup the cluster is in warning state
with missing mons, or logrotate and sometimes crashed machines.
systemctl restart service or Rebooting node usually fixes it.
I am now stuck in a situation I cannot fix :
- 1 Machine is ceph rbd client cannot auth : auth method 'x' error
-13. I have tried quite a few things, and none unlocked the situation.
I am currently trying to reboot the machine, but the busy/stuck rbd
device seems to block it. I am not looking forward to hard reset it.
- Node with the mgr service will not restart mon, or logrotate. I
did reboot it again today, but I guess this is not how a node is
expected to behave.
So my questions :
- How can I unlock my stuck ceph client, when this kind of error
occurs?
- Is this expected behavior that client looses access to cluster,
which kind of kills the machine?
- Where should I look in the ceph nodes logs to figure what is
going wrong, and how to fix it, so that is run in a stable manner?
Regards,
--
S. Barthes
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io