| | 黄浩然 | | huanghaoran_1...@163.com | ---- Replied Message ---- | From | <ceph-users-requ...@ceph.io> | | Date | 04/25/2025 17:02 | | To | <ceph-users@ceph.io> | | Subject | ceph-users Digest, Vol 130, Issue 155 | Send ceph-users mailing list submissions to ceph-users@ceph.io To subscribe or unsubscribe via email, send a message with subject or body 'help' to ceph-users-requ...@ceph.io You can reach the person managing the list at ceph-users-ow...@ceph.io When replying, please edit your Subject line so it is more specific than "Re: Contents of ceph-users digest..." Today's Topics: 1. Re: Patching Ceph cluster (Sake Ceph) 2. Re: Patching Ceph cluster (Sake Ceph) ---------------------------------------------------------------------- Date: Fri, 25 Apr 2025 10:59:52 +0200 (CEST) From: Sake Ceph <c...@paulusma.eu> Subject: [ceph-users] Re: Patching Ceph cluster To: Michael Worsham <mwors...@datadimensions.com>, "ceph-users@ceph.io" <ceph-users@ceph.io> Message-ID: <683056728.264684.1745571592...@webmail.strato.com> Content-Type: text/plain; charset=UTF-8 The tiebreaker is for a stretch cluster, which we deployed. It's only used to assign the host to a group. The Playbook is indeed written for RHEL, because that's the OS we use. It can be improved a lot, but it's a start for someone else. I know I still need to share this on GitHub, but to busy at the moment at work and at home. Op 25-04-2025 06:05 CEST schreef Michael Worsham <mwors...@datadimensions.com>: I've been reading over the playbook code, and it's nicely written. I know it's primarily RHEL focused, but I think it could be modified for Ubuntu/Debian platforms as well. A couple of questions though... In the test example hosts file, what is the tiebreaker? I know there isn't a role in the roles folder, but do you have an example of one, just so we know what it does? Thanks. -- Michael Get Outlook for Android (https://aka.ms/AAb9ysg) ------------------------------ From: Sake Ceph <c...@paulusma.eu> Sent: Friday, June 14, 2024 4:28:34 AM To: Michael Worsham <mwors...@datadimensions.com>; ceph-users@ceph.io <ceph-users@ceph.io> Subject: Re: [ceph-users] Re: Patching Ceph cluster This is an external email. Please take care when clicking links or opening attachments. When in doubt, check with the Help Desk or Security. I needed to do some cleaning before I could share this :) Maybe you or someone else can use it. Kind regards, Sake Op 14-06-2024 03:53 CEST schreef Michael Worsham <mwors...@datadimensions.com>: I'd love to see what your playbook(s) looks like for doing this. -- Michael ________________________________ From: Sake Ceph <c...@paulusma.eu> Sent: Thursday, June 13, 2024 4:05 PM To: ceph-users@ceph.io <ceph-users@ceph.io> Subject: [ceph-users] Re: Patching Ceph cluster This is an external email. Please take care when clicking links or opening attachments. When in doubt, check with the Help Desk or Security. Yeah we fully automated this with Ansible. In short we do the following. 1. Check if cluster is healthy before continuing (via REST-API) only health_ok is good 2. Disable scrub and deep-scrub 3. Update all applications on all the hosts in the cluster 4. For every host, one by one, do the following: 4a. Check if applications got updated 4b. Check via reboot-hint if a reboot is necessary 4c. If applications got updated or reboot is necessary, do the following : 4c1. Put host in maintenance 4c2. Reboot host if necessary 4c3. Check and wait via 'ceph orch host ls' if status of the host is maintance and nothing else 4c4. Get host out of maintenance 4d. Check if cluster is healthy before continuing (via Rest-API) only warning about scrub and deep-scrub is allowed, but no pg's should be degraded 5. Enable scrub and deep-scrub when all hosts are done 6. Check if cluster is healthy (via Rest-API) only health_ok is good 7. Done For upgrade the OS we have something similar, but exiting maintenance mode is broken (with 17.2.7) :( I need to check the tracker for similar issues and if I can't find anything, I will create a ticket. Kind regards, Sake Op 12-06-2024 19:02 CEST schreef Daniel Brown <daniel.h.brown@thermify.cloud>: I have two ansible roles, one for enter, one for exit. There’s likely better ways to do this — and I’ll not be surprised if someone here lets me know. They’re using orch commands via the cephadm shell. I’m using Ansible for other configuration management in my environment, as well, including setting up clients of the ceph cluster. Below excerpts from main.yml in the “tasks” for the enter/exit roles. The host I’m running ansible from is one of my CEPH servers - I’ve limited which process run there though so it’s in the cluster but not equal to the others. ————— Enter ————— - name: Ceph Maintenance Mode Enter shell: cmd: ' cephadm shell ceph orch host maintenance enter {{ (ansible_ssh_host|default(ansible_host))|default(inventory_hostname) }} --force --yes-i-really-mean-it ‘ become: True ————— Exit ————— - name: Ceph Maintenance Mode Exit shell: cmd: 'cephadm shell ceph orch host maintenance exit {{ (ansible_ssh_host|default(ansible_host))|default(inventory_hostname) }} ‘ become: True connection: local - name: Wait for Ceph to be available ansible.builtin.wait_for: delay: 60 host: '{{ (ansible_ssh_host|default(ansible_host))|default(inventory_hostname) }}’ port: 9100 connection: local On Jun 12, 2024, at 11:28 AM, Michael Worsham <mwors...@datadimensions.com> wrote: Interesting. How do you set this "maintenance mode"? If you have a series of documented steps that you have to do and could provide as an example, that would be beneficial for my efforts. We are in the process of standing up both a dev-test environment consisting of 3 Ceph servers (strictly for testing purposes) and a new production environment consisting of 20+ Ceph servers. We are using Ubuntu 22.04. -- Michael From: Daniel Brown <daniel.h.brown@thermify.cloud> Sent: Wednesday, June 12, 2024 9:18 AM To: Anthony D'Atri <anthony.da...@gmail.com> Cc: Michael Worsham <mwors...@datadimensions.com>; ceph-users@ceph.io <ceph-users@ceph.io> Subject: Re: [ceph-users] Patching Ceph cluster This is an external email. Please take care when clicking links or opening attachments. When in doubt, check with the Help Desk or Security. There’s also a Maintenance mode that you can set for each server, as you’re doing updates, so that the cluster doesn’t try to move data from affected OSD’s, while the server being updated is offline or down. I’ve worked some on automating this with Ansible, but have found my process (and/or my cluster) still requires some manual intervention while it’s running to get things done cleanly. On Jun 12, 2024, at 8:49 AM, Anthony D'Atri <anthony.da...@gmail.com> wrote: Do you mean patching the OS? If so, easy -- one node at a time, then after it comes back up, wait until all PGs are active+clean and the mon quorum is complete before proceeding. On Jun 12, 2024, at 07:56, Michael Worsham <mwors...@datadimensions.com> wrote: What is the proper way to patch a Ceph cluster and reboot the servers in said cluster if a reboot is necessary for said updates? And is it possible to automate it via Ansible? This message and its attachments are from Data Dimensions and are intended only for the use of the individual or entity to which it is addressed, and may contain information that is privileged, confidential, and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, or the employee or agent responsible for delivering the message to the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication is strictly prohibited. If you have received this communication in error, please notify the sender immediately and permanently delete the original email and destroy any copies or printouts of this email as well as any attachments. _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io This message and its attachments are from Data Dimensions and are intended only for the use of the individual or entity to which it is addressed, and may contain information that is privileged, confidential, and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, or the employee or agent responsible for delivering the message to the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication is strictly prohibited. If you have received this communication in error, please notify the sender immediately and permanently delete the original email and destroy any copies or printouts of this email as well as any attachments. _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io This message and its attachments are from Data Dimensions and are intended only for the use of the individual or entity to which it is addressed, and may contain information that is privileged, confidential, and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, or the employee or agent responsible for delivering the message to the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication is strictly prohibited. If you have received this communication in error, please notify the sender immediately and permanently delete the original email and destroy any copies or printouts of this email as well as any attachments. _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io This message and its attachments are from Data Dimensions and are intended only for the use of the individual or entity to which it is addressed, and may contain information that is privileged, confidential, and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, or the employee or agent responsible for delivering the message to the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication is strictly prohibited. If you have received this communication in error, please notify the sender immediately and permanently delete the original email and destroy any copies or printouts of this email as well as any attachments. ------------------------------ Date: Fri, 25 Apr 2025 11:02:28 +0200 (CEST) From: Sake Ceph <c...@paulusma.eu> Subject: [ceph-users] Re: Patching Ceph cluster To: Lukasz Borek <luk...@borek.org.pl> Cc: "ceph-users@ceph.io" <ceph-users@ceph.io> Message-ID: <796303375.265011.1745571748...@webmail.strato.com> Content-Type: text/plain; charset=UTF-8 Nope, it was really broken in 17.2.7. When RHEL 10 comes available, I will look into this part again :) Op 25-04-2025 07:22 CEST schreef Lukasz Borek <luk...@borek.org.pl>: For upgrade the OS we have something similar, but exiting maintenance mode is broken (with 17.2.7) :( I need to check the tracker for similar issues and if I can't find anything, I will create a ticket For 18.2.2 first maint exit command threw an exception for some reason. In my patching script I execute commands in a loop and the 2nd shoot usually works. exit maint 1/3 Error EINVAL: Traceback (most recent call last): File "/usr/share/ceph/mgr/mgr_module.py", line 1809, in _handle_command return self.handle_command(inbuf, cmd) File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 183, in handle_command return dispatch[cmd['prefix']].call(self, cmd, inbuf) File "/usr/share/ceph/mgr/mgr_module.py", line 474, in call return self.func(mgr, **kwargs) File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 119, in <lambda> wrapper_copy = lambda *l_args, **l_kwargs: wrapper(*l_args, **l_kwargs) # noqa: E731 File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 108, in wrapper return func(*args, **kwargs) File "/usr/share/ceph/mgr/orchestrator/module.py", line 778, in _host_maintenance_exit raise_if_exception(completion) File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 237, in raise_if_exception e = pickle.loads(c.serialized_exception) TypeError: __init__() missing 2 required positional arguments: 'hostname' and 'addr' exit maint 2/3 Ceph cluster f3e63d9e-2f4c-11ef-87a2-0f1170f55ed5 on cephbackup-osd1 has exited maintenance mode exit maint 3/3 Error EINVAL: Host cephbackup-osd1 is not in maintenance mode Fri Apr 25 07:17:58 CEST 2025 cluster state is HEALTH_WARN Fri Apr 25 07:18:02 CEST 2025 cluster state is HEALTH_WARN [...] On Thu, 13 Jun 2024 at 22:07, Sake Ceph <c...@paulusma.eu> wrote: For upgrade the OS we have something similar, but exiting maintenance mode is broken (with 17.2.7) :( I need to check the tracker for similar issues and if I can't find anything, I will create a ticket. Kind regards, Sake Op 12-06-2024 19:02 CEST schreef Daniel Brown <daniel.h.brown@thermify.cloud>: I have two ansible roles, one for enter, one for exit. There’s likely better ways to do this — and I’ll not be surprised if someone here lets me know. They’re using orch commands via the cephadm shell. I’m using Ansible for other configuration management in my environment, as well, including setting up clients of the ceph cluster. Below excerpts from main.yml in the “tasks” for the enter/exit roles. The host I’m running ansible from is one of my CEPH servers - I’ve limited which process run there though so it’s in the cluster but not equal to the others. ————— Enter ————— - name: Ceph Maintenance Mode Enter shell: cmd: ' cephadm shell ceph orch host maintenance enter {{ (ansible_ssh_host|default(ansible_host))|default(inventory_hostname) }} --force --yes-i-really-mean-it ‘ become: True ————— Exit ————— - name: Ceph Maintenance Mode Exit shell: cmd: 'cephadm shell ceph orch host maintenance exit {{ (ansible_ssh_host|default(ansible_host))|default(inventory_hostname) }} ‘ become: True connection: local - name: Wait for Ceph to be available ansible.builtin.wait_for: delay: 60 host: '{{ (ansible_ssh_host|default(ansible_host))|default(inventory_hostname) }}’ port: 9100 connection: local On Jun 12, 2024, at 11:28 AM, Michael Worsham <mwors...@datadimensions.com> wrote: Interesting. How do you set this "maintenance mode"? If you have a series of documented steps that you have to do and could provide as an example, that would be beneficial for my efforts. We are in the process of standing up both a dev-test environment consisting of 3 Ceph servers (strictly for testing purposes) and a new production environment consisting of 20+ Ceph servers. We are using Ubuntu 22.04. -- Michael From: Daniel Brown <daniel.h.brown@thermify.cloud> Sent: Wednesday, June 12, 2024 9:18 AM To: Anthony D'Atri <anthony.da...@gmail.com> Cc: Michael Worsham <mwors...@datadimensions.com>; ceph-users@ceph.io <ceph-users@ceph.io> Subject: Re: [ceph-users] Patching Ceph cluster This is an external email. Please take care when clicking links or opening attachments. When in doubt, check with the Help Desk or Security. There’s also a Maintenance mode that you can set for each server, as you’re doing updates, so that the cluster doesn’t try to move data from affected OSD’s, while the server being updated is offline or down. I’ve worked some on automating this with Ansible, but have found my process (and/or my cluster) still requires some manual intervention while it’s running to get things done cleanly. On Jun 12, 2024, at 8:49 AM, Anthony D'Atri <anthony.da...@gmail.com> wrote: Do you mean patching the OS? If so, easy -- one node at a time, then after it comes back up, wait until all PGs are active+clean and the mon quorum is complete before proceeding. On Jun 12, 2024, at 07:56, Michael Worsham <mwors...@datadimensions.com> wrote: What is the proper way to patch a Ceph cluster and reboot the servers in said cluster if a reboot is necessary for said updates? And is it possible to automate it via Ansible? This message and its attachments are from Data Dimensions and are intended only for the use of the individual or entity to which it is addressed, and may contain information that is privileged, confidential, and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, or the employee or agent responsible for delivering the message to the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication is strictly prohibited. If you have received this communication in error, please notify the sender immediately and permanently delete the original email and destroy any copies or printouts of this email as well as any attachments. _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io This message and its attachments are from Data Dimensions and are intended only for the use of the individual or entity to which it is addressed, and may contain information that is privileged, confidential, and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, or the employee or agent responsible for delivering the message to the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication is strictly prohibited. If you have received this communication in error, please notify the sender immediately and permanently delete the original email and destroy any copies or printouts of this email as well as any attachments. _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io -- Łukasz Borek luk...@borek.org.pl ------------------------------ Subject: Digest Footer _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s ------------------------------ End of ceph-users Digest, Vol 130, Issue 155 ******************************************** _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io