[ceph-users] ceph orch upgrade tries to pull latest?

2025-01-08 Thread tobias tempel
Dear all, i'm trying to cephadm-upgrade in an airgapped environment from 18.2.2 to 18.2.4 ... yet to no avail. local image registry is a harbor instance, I start the upgrade process with ceph orch upgrade start --image harborregistry/quay.io/ceph/ceph:v18.2.4 and status looks good ceph orc

[ceph-users] fqdn in spec

2025-01-08 Thread Piotr Pisz
Hi, We add hosts to the cluster using fqdn, manually (ceph orch host add) everything works fine. However, if we use the spec file as below, the whole thing falls apart. --- service_type: host addr: xx.xx.xx.xx hostname: ceph001.xx002.xx.xx.xx.com location: root: xx002 rack: rack01 labels:

[ceph-users] Re: Slow initial boot of OSDs in large cluster with unclean state

2025-01-08 Thread Thomas Byrne - STFC UKRI
Hi Frédéric, All of our recent OSD crashes can be attributed to genuine hardware issues (i.e. failed IO due to unreadable sectors). For reference I've had a look and it looks like we've had a handful of drive failures on this cluster in the past month, with no other significant flapping. I was

[ceph-users] Re: Slow initial boot of OSDs in large cluster with unclean state

2025-01-08 Thread Anthony D'Atri
> Just to check, are you recommending that at some point each week all PGs are > clean *at the same time*, or that no PGs should be unclean for more than a > week? The former I think, so that the cluster is converged, which in turn enables the mons to cull old maps and compact their DBs. > T

[ceph-users] who build RPM package

2025-01-08 Thread Tony Liu
Hi, I wonder which team is building Ceph RPM packages for CentOS Stream 9? I see Reef RPM packages in [1] and [2]. For example, ceph-18.2.4-0.el9.x86_64.rpm in [1] while ceph-18.2.4-1.el9.x86_64.rpm and -2 in [2]. Are those packages built by the same or different teams? What's the difference, -1

[ceph-users] OSDs won't come back after upgrade

2025-01-08 Thread Jorge Garcia
Hello, I'm going down the long and winding road of upgrading our ceph clusters from mimic to the latest version. This has involved slowly going up one release at a time. I'm now going from octopus to pacific, which also involves upgrading the OS on the host systems from Centos 7 to Rocky 9. I fir

[ceph-users] Re: Slow initial boot of OSDs in large cluster with unclean state

2025-01-08 Thread Frédéric Nass
Hi Tom, Could you describe this cluster from a hardware perspective? Network speed and MTU size, HDD type and capacity, whether OSDs have their WAL/DB on SSD/NVMe or if they're collocated, whether MONs are using HDDs or SSDs/NVMe, what workloads this cluster is handling? You mentioned OSD fla

[ceph-users] Re: Slow initial boot of OSDs in large cluster with unclean state

2025-01-08 Thread Thomas Byrne - STFC UKRI
Hi Dan, Happy new year! > I find it's always best to aim to have all PGs clean at least once a week -- that way the osdmaps can be trimmed at least weekly, preventing all sorts of nastiness, one of which you mentioned here. Just to check, are you recommending that at some point each week all PGs

[ceph-users] Re: Slow initial boot of OSDs in large cluster with unclean state

2025-01-08 Thread Thomas Byrne - STFC UKRI
Hi Anthony, Please see my replies inline. I also just wanted to say I really enjoyed your talk about QLC flash at Cephalocon, there was a lot of useful info in there. >> On our 6000+ HDD OSD cluster (pacific) > >That’s the bleeding edge in a number of respects.  Updating to at least Reef >would

[ceph-users] Re: Slow initial boot of OSDs in large cluster with unclean state

2025-01-08 Thread Dan van der Ster
Hi Tom, > Just to check, are you recommending that at some point each week all PGs are > clean *at the same time*, or that no PGs should be unclean for more than a > week? > The latter absolutely makes sense, but the former can be quite hard to manage > sometimes this cluster, with about one dr

[ceph-users] Re: Understanding filesystem size

2025-01-08 Thread Anthony D'Atri
> On Jan 8, 2025, at 4:32 AM, Nicola Mori wrote: > > Hi Anthony, > > I did all you suggested: > > ceph osd pool set wizard_data pg_autoscale_mode on > ceph config set global mon_target_pg_per_osd 200 > ceph osd pool set wizard_metadata bulk true > ceph osd pool set wizard_data bulk true >

[ceph-users] Re: ceph orch upgrade tries to pull latest?

2025-01-08 Thread Adam King
It looks like the "resource not found" message is being directly output by podman. Is there anything in the cephadm.log (/var/log/ceph/cephadm.log) on one of the hosts where this is happening that says what podman command cephadm was running that hit this error? On Wed, Jan 8, 2025 at 5:27 AM tobi

[ceph-users] Re: Understanding filesystem size

2025-01-08 Thread Nicola Mori
I've been too impatient: after some minutes the autoscaler kicked in and now the situation is the following: # ceph osd pool autoscale-status POOL SIZE TARGET SIZERATE RAW CAPACITY RATIO TARGET RATIO EFFECTIVE RATIO BIAS PG_NUM NEW PG_NUM AUTOSCALE BULK .m

[ceph-users] Re: Understanding filesystem size

2025-01-08 Thread Nicola Mori
Hi Anthony, I did all you suggested: ceph osd pool set wizard_data pg_autoscale_mode on ceph config set global mon_target_pg_per_osd 200 ceph osd pool set wizard_metadata bulk true ceph osd pool set wizard_data bulk true but this didn't change anything. The PG count is unchanged and als

[ceph-users] Re: fqdn in spec

2025-01-08 Thread Lukasz Borek
I never used fqdn this way, but there is an option for cephadm bootstrap command --allow-fqdn-hostname allow hostname that is fully-qualified (contains ".") Worth checking. Not sure what's behind. Thanks On Wed, 8 Jan 2025 at 12:14, Piotr Pisz wrote: > Hi, > > We ad

[ceph-users] Re: Slow initial boot of OSDs in large cluster with unclean state

2025-01-08 Thread Thomas Byrne - STFC UKRI
Hi Wes, It works out at about five new osdmaps a minute, which is about normal for this cluster's state changes as far as I can tell. It'll drop down to 2-3 maps/minute during quiet periods, but the combination of the upmap balancer making changes and occasional OSD flaps or crashes due to hard

[ceph-users] Re: squid 19.2.1 RC QE validation status

2025-01-08 Thread Yuri Weinstein
We are still missing some approvals: crimson-rados - Matan, Samuel Laura, unless you feel those are a must to have, we are ready for gibba and LRC/sepia upgrades. On Wed, Jan 8, 2025 at 2:32 PM Laura Flores wrote: > Rados and upgrade are approved: > https://tracker.ceph.com/projects/rados/wiki

[ceph-users] Re: squid 19.2.1 RC QE validation status

2025-01-08 Thread Laura Flores
Rados and upgrade are approved: https://tracker.ceph.com/projects/rados/wiki/SQUID#v1921-httpstrackercephcomissues69234 On Tue, Jan 7, 2025 at 4:32 PM Laura Flores wrote: > I am checking a few things for core and the upgrade suites, but should > have a response soon. > > Laura Flores > > She/Her