> > For upgrade the OS we have something similar, but exiting maintenance mode > is broken (with 17.2.7) :( > I need to check the tracker for similar issues and if I can't find > anything, I will create a ticket
For 18.2.2 first maint exit command threw an exception for some reason. In my patching script I execute commands in a loop and the 2nd shoot usually works. exit maint 1/3 Error EINVAL: Traceback (most recent call last): File "/usr/share/ceph/mgr/mgr_module.py", line 1809, in _handle_command return self.handle_command(inbuf, cmd) File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 183, in handle_command return dispatch[cmd['prefix']].call(self, cmd, inbuf) File "/usr/share/ceph/mgr/mgr_module.py", line 474, in call return self.func(mgr, **kwargs) File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 119, in <lambda> wrapper_copy = lambda *l_args, **l_kwargs: wrapper(*l_args, **l_kwargs) # noqa: E731 File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 108, in wrapper return func(*args, **kwargs) File "/usr/share/ceph/mgr/orchestrator/module.py", line 778, in _host_maintenance_exit raise_if_exception(completion) File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 237, in raise_if_exception e = pickle.loads(c.serialized_exception) TypeError: __init__() missing 2 required positional arguments: 'hostname' and 'addr' exit maint 2/3 Ceph cluster f3e63d9e-2f4c-11ef-87a2-0f1170f55ed5 on cephbackup-osd1 has exited maintenance mode exit maint 3/3 Error EINVAL: Host cephbackup-osd1 is not in maintenance mode Fri Apr 25 07:17:58 CEST 2025 cluster state is HEALTH_WARN Fri Apr 25 07:18:02 CEST 2025 cluster state is HEALTH_WARN [...] On Thu, 13 Jun 2024 at 22:07, Sake Ceph <c...@paulusma.eu> wrote: > > > For upgrade the OS we have something similar, but exiting maintenance mode > is broken (with 17.2.7) :( > I need to check the tracker for similar issues and if I can't find > anything, I will create a ticket. > > Kind regards, > Sake > > > Op 12-06-2024 19:02 CEST schreef Daniel Brown > <daniel.h.brown@thermify.cloud>: > > > > > > I have two ansible roles, one for enter, one for exit. There’s likely > better ways to do this — and I’ll not be surprised if someone here lets me > know. They’re using orch commands via the cephadm shell. I’m using Ansible > for other configuration management in my environment, as well, including > setting up clients of the ceph cluster. > > > > > > Below excerpts from main.yml in the “tasks” for the enter/exit roles. > The host I’m running ansible from is one of my CEPH servers - I’ve limited > which process run there though so it’s in the cluster but not equal to the > others. > > > > > > ————— > > Enter > > ————— > > > > - name: Ceph Maintenance Mode Enter > > shell: > > > > cmd: ' cephadm shell ceph orch host maintenance enter {{ > (ansible_ssh_host|default(ansible_host))|default(inventory_hostname) }} > --force --yes-i-really-mean-it ‘ > > become: True > > > > > > > > ————— > > Exit > > ————— > > > > > > - name: Ceph Maintenance Mode Exit > > shell: > > cmd: 'cephadm shell ceph orch host maintenance exit {{ > (ansible_ssh_host|default(ansible_host))|default(inventory_hostname) }} ‘ > > become: True > > connection: local > > > > > > - name: Wait for Ceph to be available > > ansible.builtin.wait_for: > > delay: 60 > > host: '{{ > (ansible_ssh_host|default(ansible_host))|default(inventory_hostname) }}’ > > port: 9100 > > connection: local > > > > > > > > > > > > > > > On Jun 12, 2024, at 11:28 AM, Michael Worsham < > mwors...@datadimensions.com> wrote: > > > > > > Interesting. How do you set this "maintenance mode"? If you have a > series of documented steps that you have to do and could provide as an > example, that would be beneficial for my efforts. > > > > > > We are in the process of standing up both a dev-test environment > consisting of 3 Ceph servers (strictly for testing purposes) and a new > production environment consisting of 20+ Ceph servers. > > > > > > We are using Ubuntu 22.04. > > > > > > -- Michael > > > From: Daniel Brown <daniel.h.brown@thermify.cloud> > > > Sent: Wednesday, June 12, 2024 9:18 AM > > > To: Anthony D'Atri <anthony.da...@gmail.com> > > > Cc: Michael Worsham <mwors...@datadimensions.com>; ceph-users@ceph.io > <ceph-users@ceph.io> > > > Subject: Re: [ceph-users] Patching Ceph cluster > > > This is an external email. Please take care when clicking links or > opening attachments. When in doubt, check with the Help Desk or Security. > > > > > > > > > There’s also a Maintenance mode that you can set for each server, as > you’re doing updates, so that the cluster doesn’t try to move data from > affected OSD’s, while the server being updated is offline or down. I’ve > worked some on automating this with Ansible, but have found my process > (and/or my cluster) still requires some manual intervention while it’s > running to get things done cleanly. > > > > > > > > > > > > > On Jun 12, 2024, at 8:49 AM, Anthony D'Atri <anthony.da...@gmail.com> > wrote: > > > > > > > > Do you mean patching the OS? > > > > > > > > If so, easy -- one node at a time, then after it comes back up, wait > until all PGs are active+clean and the mon quorum is complete before > proceeding. > > > > > > > > > > > > > > > >> On Jun 12, 2024, at 07:56, Michael Worsham < > mwors...@datadimensions.com> wrote: > > > >> > > > >> What is the proper way to patch a Ceph cluster and reboot the > servers in said cluster if a reboot is necessary for said updates? And is > it possible to automate it via Ansible? This message and its attachments > are from Data Dimensions and are intended only for the use of the > individual or entity to which it is addressed, and may contain information > that is privileged, confidential, and exempt from disclosure under > applicable law. If the reader of this message is not the intended > recipient, or the employee or agent responsible for delivering the message > to the intended recipient, you are hereby notified that any dissemination, > distribution, or copying of this communication is strictly prohibited. If > you have received this communication in error, please notify the sender > immediately and permanently delete the original email and destroy any > copies or printouts of this email as well as any attachments. > > > >> _______________________________________________ > > > >> ceph-users mailing list -- ceph-users@ceph.io > > > >> To unsubscribe send an email to ceph-users-le...@ceph.io > > > > _______________________________________________ > > > > ceph-users mailing list -- ceph-users@ceph.io > > > > To unsubscribe send an email to ceph-users-le...@ceph.io > > > > > > This message and its attachments are from Data Dimensions and are > intended only for the use of the individual or entity to which it is > addressed, and may contain information that is privileged, confidential, > and exempt from disclosure under applicable law. If the reader of this > message is not the intended recipient, or the employee or agent responsible > for delivering the message to the intended recipient, you are hereby > notified that any dissemination, distribution, or copying of this > communication is strictly prohibited. If you have received this > communication in error, please notify the sender immediately and > permanently delete the original email and destroy any copies or printouts > of this email as well as any attachments. > > > > _______________________________________________ > > ceph-users mailing list -- ceph-users@ceph.io > > To unsubscribe send an email to ceph-users-le...@ceph.io > _______________________________________________ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > -- Łukasz Borek luk...@borek.org.pl _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io