[ceph-users] Re: Patching Ceph cluster

Lukasz Borek Thu, 24 Apr 2025 22:24:11 -0700

>
> For upgrade the OS we have something similar, but exiting maintenance mode
> is broken (with 17.2.7) :(
> I need to check the tracker for similar issues and if I can't find
> anything, I will create a ticket


For 18.2.2 first maint exit command threw an exception for some reason. In
my patching script I execute commands in a loop and the 2nd shoot usually
works.

exit maint 1/3
Error EINVAL: Traceback (most recent call last):
  File "/usr/share/ceph/mgr/mgr_module.py", line 1809, in _handle_command
    return self.handle_command(inbuf, cmd)
  File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 183, in
handle_command
    return dispatch[cmd['prefix']].call(self, cmd, inbuf)
  File "/usr/share/ceph/mgr/mgr_module.py", line 474, in call
    return self.func(mgr, **kwargs)
  File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 119, in
<lambda>
    wrapper_copy = lambda *l_args, **l_kwargs: wrapper(*l_args, **l_kwargs)
 # noqa: E731
  File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 108, in
wrapper
    return func(*args, **kwargs)
  File "/usr/share/ceph/mgr/orchestrator/module.py", line 778, in
_host_maintenance_exit
    raise_if_exception(completion)
  File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 237, in
raise_if_exception
    e = pickle.loads(c.serialized_exception)
TypeError: __init__() missing 2 required positional arguments: 'hostname'
and 'addr'

exit maint 2/3
Ceph cluster f3e63d9e-2f4c-11ef-87a2-0f1170f55ed5 on cephbackup-osd1 has
exited maintenance mode
exit maint 3/3
Error EINVAL: Host cephbackup-osd1 is not in maintenance mode
Fri Apr 25 07:17:58 CEST 2025 cluster state is HEALTH_WARN
Fri Apr 25 07:18:02 CEST 2025 cluster state is HEALTH_WARN
[...]




On Thu, 13 Jun 2024 at 22:07, Sake Ceph <c...@paulusma.eu> wrote:

>
>
> For upgrade the OS we have something similar, but exiting maintenance mode
> is broken (with 17.2.7) :(
> I need to check the tracker for similar issues and if I can't find
> anything, I will create a ticket.
>
> Kind regards,
> Sake
>
> > Op 12-06-2024 19:02 CEST schreef Daniel Brown
> <daniel.h.brown@thermify.cloud>:
> >
> >
> > I have two ansible roles, one for enter, one for exit. There’s likely
> better ways to do this — and I’ll not be surprised if someone here lets me
> know. They’re using orch commands via the cephadm shell. I’m using Ansible
> for other configuration management in my environment, as well, including
> setting up clients of the ceph cluster.
> >
> >
> > Below excerpts from main.yml in the “tasks” for the enter/exit roles.
> The host I’m running ansible from is one of my CEPH servers - I’ve limited
> which process run there though so it’s in the cluster but not equal to the
> others.
> >
> >
> > —————
> > Enter
> > —————
> >
> > - name: Ceph Maintenance Mode Enter
> >   shell:
> >
> >     cmd: ' cephadm shell ceph orch host maintenance enter {{
> (ansible_ssh_host|default(ansible_host))|default(inventory_hostname) }}
> --force --yes-i-really-mean-it ‘
> >   become: True
> >
> >
> >
> > —————
> > Exit
> > —————
> >
> >
> > - name: Ceph Maintenance Mode Exit
> >   shell:
> >     cmd: 'cephadm shell ceph orch host maintenance exit {{
> (ansible_ssh_host|default(ansible_host))|default(inventory_hostname) }} ‘
> >   become: True
> >   connection: local
> >
> >
> > - name: Wait for Ceph to be available
> >   ansible.builtin.wait_for:
> >     delay: 60
> >     host: '{{
> (ansible_ssh_host|default(ansible_host))|default(inventory_hostname) }}’
> >     port: 9100
> >   connection: local
> >
> >
> >
> >
> >
> >
> > > On Jun 12, 2024, at 11:28 AM, Michael Worsham <
> mwors...@datadimensions.com> wrote:
> > >
> > > Interesting. How do you set this "maintenance mode"? If you have a
> series of documented steps that you have to do and could provide as an
> example, that would be beneficial for my efforts.
> > >
> > > We are in the process of standing up both a dev-test environment
> consisting of 3 Ceph servers (strictly for testing purposes) and a new
> production environment consisting of 20+ Ceph servers.
> > >
> > > We are using Ubuntu 22.04.
> > >
> > > -- Michael
> > > From: Daniel Brown <daniel.h.brown@thermify.cloud>
> > > Sent: Wednesday, June 12, 2024 9:18 AM
> > > To: Anthony D'Atri <anthony.da...@gmail.com>
> > > Cc: Michael Worsham <mwors...@datadimensions.com>; ceph-users@ceph.io
> <ceph-users@ceph.io>
> > > Subject: Re: [ceph-users] Patching Ceph cluster
> > >  This is an external email. Please take care when clicking links or
> opening attachments. When in doubt, check with the Help Desk or Security.
> > >
> > >
> > > There’s also a Maintenance mode that you can set for each server, as
> you’re doing updates, so that the cluster doesn’t try to move data from
> affected OSD’s, while the server being updated is offline or down. I’ve
> worked some on automating this with Ansible, but have found my process
> (and/or my cluster) still requires some manual intervention while it’s
> running to get things done cleanly.
> > >
> > >
> > >
> > > > On Jun 12, 2024, at 8:49 AM, Anthony D'Atri <anthony.da...@gmail.com>
> wrote:
> > > >
> > > > Do you mean patching the OS?
> > > >
> > > > If so, easy -- one node at a time, then after it comes back up, wait
> until all PGs are active+clean and the mon quorum is complete before
> proceeding.
> > > >
> > > >
> > > >
> > > >> On Jun 12, 2024, at 07:56, Michael Worsham <
> mwors...@datadimensions.com> wrote:
> > > >>
> > > >> What is the proper way to patch a Ceph cluster and reboot the
> servers in said cluster if a reboot is necessary for said updates? And is
> it possible to automate it via Ansible? This message and its attachments
> are from Data Dimensions and are intended only for the use of the
> individual or entity to which it is addressed, and may contain information
> that is privileged, confidential, and exempt from disclosure under
> applicable law. If the reader of this message is not the intended
> recipient, or the employee or agent responsible for delivering the message
> to the intended recipient, you are hereby notified that any dissemination,
> distribution, or copying of this communication is strictly prohibited. If
> you have received this communication in error, please notify the sender
> immediately and permanently delete the original email and destroy any
> copies or printouts of this email as well as any attachments.
> > > >> _______________________________________________
> > > >> ceph-users mailing list -- ceph-users@ceph.io
> > > >> To unsubscribe send an email to ceph-users-le...@ceph.io
> > > > _______________________________________________
> > > > ceph-users mailing list -- ceph-users@ceph.io
> > > > To unsubscribe send an email to ceph-users-le...@ceph.io
> > >
> > > This message and its attachments are from Data Dimensions and are
> intended only for the use of the individual or entity to which it is
> addressed, and may contain information that is privileged, confidential,
> and exempt from disclosure under applicable law. If the reader of this
> message is not the intended recipient, or the employee or agent responsible
> for delivering the message to the intended recipient, you are hereby
> notified that any dissemination, distribution, or copying of this
> communication is strictly prohibited. If you have received this
> communication in error, please notify the sender immediately and
> permanently delete the original email and destroy any copies or printouts
> of this email as well as any attachments.
> >
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>


-- 
Łukasz Borek
luk...@borek.org.pl
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Patching Ceph cluster

Reply via email to