[ceph-users] Re: Patching Ceph cluster

Sake Ceph Fri, 25 Apr 2025 02:04:44 -0700

Nope, it was really broken in 17.2.7. When RHEL 10 comes available, I will look 
into this part again :)


> Op 25-04-2025 07:22 CEST schreef Lukasz Borek <luk...@borek.org.pl>:
> 
> 
> > For upgrade the OS we have something similar, but exiting maintenance mode 
> > is broken (with 17.2.7) :(
> > I need to check the tracker for similar issues and if I can't find 
> > anything, I will create a ticket
> For 18.2.2 first maint exit command threw an exception for some reason. In my 
> patching script I execute commands in a loop and the 2nd shoot usually works.
> 
> exit maint 1/3
> Error EINVAL: Traceback (most recent call last):
>  File "/usr/share/ceph/mgr/mgr_module.py", line 1809, in _handle_command
>  return self.handle_command(inbuf, cmd)
>  File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 183, in 
> handle_command
>  return dispatch[cmd['prefix']].call(self, cmd, inbuf)
>  File "/usr/share/ceph/mgr/mgr_module.py", line 474, in call
>  return self.func(mgr, **kwargs)
>  File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 119, in <lambda>
>  wrapper_copy = lambda *l_args, **l_kwargs: wrapper(*l_args, **l_kwargs) # 
> noqa: E731
>  File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 108, in wrapper
>  return func(*args, **kwargs)
>  File "/usr/share/ceph/mgr/orchestrator/module.py", line 778, in 
> _host_maintenance_exit
>  raise_if_exception(completion)
>  File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 237, in 
> raise_if_exception
>  e = pickle.loads(c.serialized_exception)
> TypeError: __init__() missing 2 required positional arguments: 'hostname' and 
> 'addr'
> 
> exit maint 2/3
> Ceph cluster f3e63d9e-2f4c-11ef-87a2-0f1170f55ed5 on cephbackup-osd1 has 
> exited maintenance mode
> exit maint 3/3
> Error EINVAL: Host cephbackup-osd1 is not in maintenance mode
> Fri Apr 25 07:17:58 CEST 2025 cluster state is HEALTH_WARN
> Fri Apr 25 07:18:02 CEST 2025 cluster state is HEALTH_WARN
> [...]
> 
> 
> 
> 
> 
> On Thu, 13 Jun 2024 at 22:07, Sake Ceph <c...@paulusma.eu> wrote:
> > 
> >  
> >  For upgrade the OS we have something similar, but exiting maintenance mode 
> > is broken (with 17.2.7) :(
> >  I need to check the tracker for similar issues and if I can't find 
> > anything, I will create a ticket. 
> >  
> >  Kind regards, 
> >  Sake 
> >  
> >  > Op 12-06-2024 19:02 CEST schreef Daniel Brown 
> > <daniel.h.brown@thermify.cloud>:
> >  > 
> >  > 
> >  > I have two ansible roles, one for enter, one for exit. There’s likely 
> > better ways to do this — and I’ll not be surprised if someone here lets me 
> > know. They’re using orch commands via the cephadm shell. I’m using Ansible 
> > for other configuration management in my environment, as well, including 
> > setting up clients of the ceph cluster. 
> >  > 
> >  > 
> >  > Below excerpts from main.yml in the “tasks” for the enter/exit roles. 
> > The host I’m running ansible from is one of my CEPH servers - I’ve limited 
> > which process run there though so it’s in the cluster but not equal to the 
> > others. 
> >  > 
> >  > 
> >  > —————
> >  > Enter
> >  > —————
> >  > 
> >  > - name: Ceph Maintenance Mode Enter
> >  > shell:
> >  > 
> >  > cmd: ' cephadm shell ceph orch host maintenance enter {{ 
> > (ansible_ssh_host|default(ansible_host))|default(inventory_hostname) }} 
> > --force --yes-i-really-mean-it ‘
> >  > become: True
> >  > 
> >  > 
> >  > 
> >  > —————
> >  > Exit
> >  > ————— 
> >  > 
> >  > 
> >  > - name: Ceph Maintenance Mode Exit
> >  > shell:
> >  > cmd: 'cephadm shell ceph orch host maintenance exit {{ 
> > (ansible_ssh_host|default(ansible_host))|default(inventory_hostname) }} ‘
> >  > become: True
> >  > connection: local
> >  > 
> >  > 
> >  > - name: Wait for Ceph to be available
> >  > ansible.builtin.wait_for:
> >  > delay: 60
> >  > host: '{{ 
> > (ansible_ssh_host|default(ansible_host))|default(inventory_hostname) }}’
> >  > port: 9100
> >  > connection: local
> >  > 
> >  > 
> >  > 
> >  > 
> >  > 
> >  > 
> >  > > On Jun 12, 2024, at 11:28 AM, Michael Worsham 
> > <mwors...@datadimensions.com> wrote:
> >  > > 
> >  > > Interesting. How do you set this "maintenance mode"? If you have a 
> > series of documented steps that you have to do and could provide as an 
> > example, that would be beneficial for my efforts.
> >  > > 
> >  > > We are in the process of standing up both a dev-test environment 
> > consisting of 3 Ceph servers (strictly for testing purposes) and a new 
> > production environment consisting of 20+ Ceph servers.
> >  > > 
> >  > > We are using Ubuntu 22.04.
> >  > > 
> >  > > -- Michael
> >  > > From: Daniel Brown <daniel.h.brown@thermify.cloud>
> >  > > Sent: Wednesday, June 12, 2024 9:18 AM
> >  > > To: Anthony D'Atri <anthony.da...@gmail.com>
> >  > > Cc: Michael Worsham <mwors...@datadimensions.com>; ceph-users@ceph.io 
> > <ceph-users@ceph.io>
> >  > > Subject: Re: [ceph-users] Patching Ceph cluster
> >  > > This is an external email. Please take care when clicking links or 
> > opening attachments. When in doubt, check with the Help Desk or Security.
> >  > > 
> >  > > 
> >  > > There’s also a Maintenance mode that you can set for each server, as 
> > you’re doing updates, so that the cluster doesn’t try to move data from 
> > affected OSD’s, while the server being updated is offline or down. I’ve 
> > worked some on automating this with Ansible, but have found my process 
> > (and/or my cluster) still requires some manual intervention while it’s 
> > running to get things done cleanly.
> >  > > 
> >  > > 
> >  > > 
> >  > > > On Jun 12, 2024, at 8:49 AM, Anthony D'Atri 
> > <anthony.da...@gmail.com> wrote:
> >  > > >
> >  > > > Do you mean patching the OS?
> >  > > >
> >  > > > If so, easy -- one node at a time, then after it comes back up, wait 
> > until all PGs are active+clean and the mon quorum is complete before 
> > proceeding.
> >  > > >
> >  > > >
> >  > > >
> >  > > >> On Jun 12, 2024, at 07:56, Michael Worsham 
> > <mwors...@datadimensions.com> wrote:
> >  > > >>
> >  > > >> What is the proper way to patch a Ceph cluster and reboot the 
> > servers in said cluster if a reboot is necessary for said updates? And is 
> > it possible to automate it via Ansible? This message and its attachments 
> > are from Data Dimensions and are intended only for the use of the 
> > individual or entity to which it is addressed, and may contain information 
> > that is privileged, confidential, and exempt from disclosure under 
> > applicable law. If the reader of this message is not the intended 
> > recipient, or the employee or agent responsible for delivering the message 
> > to the intended recipient, you are hereby notified that any dissemination, 
> > distribution, or copying of this communication is strictly prohibited. If 
> > you have received this communication in error, please notify the sender 
> > immediately and permanently delete the original email and destroy any 
> > copies or printouts of this email as well as any attachments.
> >  > > >> _______________________________________________
> >  > > >> ceph-users mailing list -- ceph-users@ceph.io
> >  > > >> To unsubscribe send an email to ceph-users-le...@ceph.io
> >  > > > _______________________________________________
> >  > > > ceph-users mailing list -- ceph-users@ceph.io
> >  > > > To unsubscribe send an email to ceph-users-le...@ceph.io
> >  > > 
> >  > > This message and its attachments are from Data Dimensions and are 
> > intended only for the use of the individual or entity to which it is 
> > addressed, and may contain information that is privileged, confidential, 
> > and exempt from disclosure under applicable law. If the reader of this 
> > message is not the intended recipient, or the employee or agent responsible 
> > for delivering the message to the intended recipient, you are hereby 
> > notified that any dissemination, distribution, or copying of this 
> > communication is strictly prohibited. If you have received this 
> > communication in error, please notify the sender immediately and 
> > permanently delete the original email and destroy any copies or printouts 
> > of this email as well as any attachments.
> >  > 
> >  > _______________________________________________
> >  > ceph-users mailing list -- ceph-users@ceph.io
> >  > To unsubscribe send an email to ceph-users-le...@ceph.io
> >  _______________________________________________
> >  ceph-users mailing list -- ceph-users@ceph.io
> >  To unsubscribe send an email to ceph-users-le...@ceph.io
> > 
> 
> 
> --
> 
> Łukasz Borek
> luk...@borek.org.pl
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Patching Ceph cluster

Reply via email to