[ceph-users] weekend maintenance to bot+bridge slack/irc/discord

2025-01-17 Thread Alvaro Soto
Hi All, This weekend, I'll be moving the bot from the primary node, so expect some hiccups in the bridge and log bot. Cheers. -- Alvaro Soto *Note: My work hours may not be your work hours. Please do not feel the need to respond during a time that is not convenient for you.* --

[ceph-users] Re: Adding Rack to crushmap - Rebalancing multiple PB of data - advice/experience

2025-01-17 Thread Anthony D'Atri
That’s great to know, Bryan. I’ve seen multiple locations for the code out there, which one is canonical? (Lowercase c) > On Jan 17, 2025, at 3:46 PM, Stillwell, Bryan wrote: > > The latest version (since September) switched to using the python rados > bindings which not only fixes this probl

[ceph-users] Re: MDS hung in purge_stale_snap_data after populating cache

2025-01-17 Thread Bailey Allison
Hey Frank, I hate to sound like a broken record here but if you can access any of the stuff that's in rank 2 try running a 'find /path/to/dir/ -ls' on some of the stuff and see if the num_strays decrease. I've had that help last time we've had an MDS like that. Regards, Bailey Allison Servi

[ceph-users] Re: Adding Rack to crushmap - Rebalancing multiple PB of data - advice/experience

2025-01-17 Thread Stillwell, Bryan
The latest version (since September) switched to using the python rados bindings which not only fixes this problem, but also makes it much faster. It also has a fix I made that orders the upmaps so that data is moved off of OSDs before trying to move data on to them. This helps a lot on cluste

[ceph-users] Re: Adding Rack to crushmap - Rebalancing multiple PB of data - advice/experience

2025-01-17 Thread Stillwell, Bryan
Dan can confirm, but this is what I believe is main repo: https://github.com/cernceph/ceph-scripts/blob/master/tools/upmap/upmap-remapped.py Bryan From: Anthony D'Atri Date: Friday, January 17, 2025 at 15:35 To: Stillwell, Bryan Cc: Alexander Patrakov , Kasper Rasmussen , ceph-users@ceph.io

[ceph-users] Re: squid 19.2.1 RC QE validation status

2025-01-17 Thread Neha Ojha
This ceph-volume PR https://github.com/ceph/ceph/pull/60487 that merged last week had a dependency on this BlueStore PR https://github.com/ceph/ceph/pull/60543 (particularly https://github.com/ceph/ceph/pull/60487/files#diff-29697ff230f01df036802c8b2842648267767b3a7231ea04a402eaf4e1819d29R30-R31).

[ceph-users] Re: squid 19.2.1 RC QE validation status

2025-01-17 Thread Yuri Weinstein
Assuming you meant - that no additional PRs are to be included in 19.2.1 Dan FYI we are back at the LRC upgrade step On Fri, Jan 17, 2025 at 11:36 AM Neha Ojha wrote: > > This ceph-volume PR https://github.com/ceph/ceph/pull/60487 that > merged last week had a dependency on this BlueStore PR > h

[ceph-users] Re: [EXTERNAL] Re: Cephadm: Specifying RGW Certs & Keys By Filepath

2025-01-17 Thread Alex Hussein-Kershaw (HE/HIM)
Moring Redo, I had dropped the "rgw_frontend_port" as I was providing it via "ssl_port". However I tested with it back in also: service_type: rgw service_id: rgw service_name: rgw.rgw placement: count_per_host: 1 host_pattern: '*' extra_container_args: - -v - /etc/pki:/etc/pki:ro - -v - /etc

[ceph-users] Non existing host in maintenance

2025-01-17 Thread Dominique Ramaekers
Hi, I have removed a host (hvs004) that was in maintenance. The system disk of this host had failed, so removed the host hvs004 in ceph; replaced the system disk; erased all the osd-disks and reinstalled the host as hvs005. Resulting a cluster status in waring that doesn’t goes away: health: H

[ceph-users] Adding Rack to crushmap - Rebalancing multiple PB of data - advice/experience

2025-01-17 Thread Kasper Rasmussen
I'm managing a ceph cluster with +1K OSDs distributed accross 56 host. Untill now the crush rule used is the default replicated rule, but I want to change that in order to implement failure domain on rack level. Ceph version: Pacific 16.2.15 All pools(RBD and CephFS) currently use the default re

[ceph-users] Re: [EXTERNAL] Re: Cephadm: Specifying RGW Certs & Keys By Filepath

2025-01-17 Thread Redouane Kachach
Thank you Alex, Reading the description of the tracker https://tracker.ceph.com/issues/69567 sounds like the outcome is very similar to when using the self-signed auto-generated certificates by cephadm. Wouldn't that be a viable option? I mean is there a big difference between self-signing the cer

[ceph-users] Re: Non existing host in maintenance

2025-01-17 Thread Eugen Block
Hi, there's no need to wipe OSDs from a failed host. Just reinstall the OS, configure it to your needs, install cephadm and podman/docker, add the cephadm pub key and then just reactivate the OSDs: ceph cephadm osd activate I just did that yesterday. To clear the warning, I would check t

[ceph-users] Re: [EXTERNAL] Re: Cephadm: Specifying RGW Certs & Keys By Filepath

2025-01-17 Thread Alex Hussein-Kershaw (HE/HIM)
Thanks for the suggestion. It may be that it does fit my use case but I wasn't sure from the docs. I have a self-signed CA that I use for my deployment, can pass a root CA cert and key in to Cephadm to have it auto-generate a server certificate as a child of that? I noticed in the docs there i

[ceph-users] Re: [EXTERNAL] Re: Cephadm: Specifying RGW Certs & Keys By Filepath

2025-01-17 Thread Redouane Kachach
When certificate auto-generation is configured (by setting generate_cert to true in the spec) cephadm will take care of generating the certificates for each host. This feature was introduced recently so I'm not sure if it's well documented or not (I'll check that later). We are introducing this fun

[ceph-users] Re: Non existing host in maintenance

2025-01-17 Thread Dominique Ramaekers
I've removed two key's: #ceph config-key rm mgr/cephadm/host.hvs004.devices.0 #ceph config-key rm mgr/telemetry/host-id/hvs004 Now I only have 'history' key's. #ceph config-key ls | grep hvs004 | head gives me: "config-history/1027/+osd/host:hvs004/osd_memory_target", "config-history/1027/

[ceph-users] Re: Non existing host in maintenance

2025-01-17 Thread Eugen Block
Did you fail the mgr? Otherwise it can take up to 15 minutes to refresh, I believe. The history doesn't hurt anyone, it's up to you if you want to keep it. Zitat von Dominique Ramaekers : I've removed two key's: #ceph config-key rm mgr/cephadm/host.hvs004.devices.0 #ceph config-key rm mgr/t

[ceph-users] Re: Non existing host in maintenance

2025-01-17 Thread Dominique Ramaekers
Hi Eugen, Failing the manager refreshed the config. Thanks. The warning is gone. > -Oorspronkelijk bericht- > Van: Eugen Block > Verzonden: vrijdag 17 januari 2025 14:05 > Aan: Dominique Ramaekers > CC: ceph-users@ceph.io > Onderwerp: Re: [ceph-users] Re: Non existing host in maintenance

[ceph-users] Re: MDS hung in purge_stale_snap_data after populating cache

2025-01-17 Thread Frank Schilder
Hi Bailey. ceph-14 (rank=0): num_stray=205532 ceph-13 (rank=1): num_stray=4446 ceph-21-mds (rank=2): num_stray=99446249 ceph-23 (rank=3): num_stray=3412 ceph-08 (rank=4): num_stray=1238 ceph-15 (rank=5): num_stray=1486 ceph-16 (rank=6): num_stray=5545 ceph-11 (rank=7): num_stray=2995 The stats fo

[ceph-users] Re: Adding Rack to crushmap - Rebalancing multiple PB of data - advice/experience

2025-01-17 Thread Anthony D'Atri
> On Jan 17, 2025, at 6:02 AM, Kasper Rasmussen > wrote: > > However I'm concerned with the amount of data that needs to be rebalanced, > since the cluster holds multiple PB, and I'm looking for review of/input for > my plan, as well as words of advice/experience from someone who has been in

[ceph-users] Re: [EXTERNAL] Re: Cephadm: Specifying RGW Certs & Keys By Filepath

2025-01-17 Thread Redouane Kachach
I see, unfortunately I can't see an easy way to avoid that. With the current code you will get either port= or ssl_port= depending on whether you enable ssl or not. At the end of the day the result get written in the configuration variable rgw_frontends (ceph config ls | grep rgw_frontends). I'll

[ceph-users] Re: Adding Rack to crushmap - Rebalancing multiple PB of data - advice/experience

2025-01-17 Thread Alexander Patrakov
Hello Kasper, Please be aware that the current "upmap-remapped" script is flaky. It might just refuse to work, with this message: Error loading remapped pgs This has been traced to the fact that "ceph pg ls remapped -f json" sets its stderr to non-blocking mode, and that is the same file descrip