[ceph-users] Re: The effect of changing an osd's class

2024-11-18 Thread Roland Giesler
On 2024/11/17 15:20, Gregory Orange wrote: On 17/11/24 19:44, Roland Giesler wrote: I cannot see any option that allows me to disable mclock... It's not so much disabling mclock as changing the op queue scheduler to use wpq instead of it. https://docs.ceph.com/en/reef/rados/configuration/osd-c

[ceph-users] Re: The effect of changing an osd's class

2024-11-18 Thread Anthony D'Atri
Glad you’re sorted out. I had a feeling it was a function of not being able to satisfy pool / rule constraints. > On Nov 18, 2024, at 1:58 AM, Roland Giesler wrote: > > On 2024/11/17 18:12, Anthony D'Atri wrote: >> I see 5 OSDs with 0 CRUSH weight, is that intentional? > > Yes, I set the wei

[ceph-users] Re: The effect of changing an osd's class

2024-11-18 Thread Roland Giesler
On 2024/11/17 18:12, Anthony D'Atri wrote: I see 5 OSDs with 0 CRUSH weight, is that intentional? Yes, I  set the weight to 0 to ensure all the pg's are removed from them them since I'm removing them (worn out ssd's) I think I found the problem.  I had created a CRUSH rule called old_ssd (a

[ceph-users] Re: The effect of changing an osd's class

2024-11-17 Thread Anthony D'Atri
I see 5 OSDs with 0 CRUSH weight, is that intentional? Notably: > All the problem pg's are on osd.39. osd.39 has 0 CRUSH weight, so CRUSH shouldn’t be placing any PGs there. Yet there appear to be PGs mapped to the 4x 0 weight OSDs that are up. I had hoped that the health detail would show

[ceph-users] Re: The effect of changing an osd's class

2024-11-17 Thread Gregory Orange
On 17/11/24 19:44, Roland Giesler wrote: > On 2024/11/16 18:38, Anthony D'Atri wrote: >> Disabling mclock as described here >> https://docs.ceph.com/en/reef/rados/configuration/mclock-config-ref/ might >> help > > I cannot see any option that allows me to disable mclock... It's not so much disab

[ceph-users] Re: The effect of changing an osd's class

2024-11-17 Thread Roland Giesler
On 2024/11/16 18:38, Anthony D'Atri wrote: Disabling mclock as described here https://docs.ceph.com/en/reef/rados/configuration/mclock-config-ref/ might help I cannot see any option that allows me to disable mclock... Also, you have a small cluster with a bunch of small OSDs.  Please send

[ceph-users] Re: The effect of changing an osd's class

2024-11-15 Thread Roland Giesler
All the problem pg's are on osd.39.  When I stop osd.39, it shows 86 pg's would be offline.  However, there is no recovery that happens.  It just stays there.  86 undersized+remapped+peered I managed to pin down all the pg groups that are in this state by using: ceph pg dump | grep active+clea

[ceph-users] Re: The effect of changing an osd's class

2024-11-15 Thread Roland Giesler
On 2024/11/15 13:00, Gregory Orange wrote: On 15/11/24 17:11, Roland Giesler wrote: How do I determine the primary osd? ceph pg map $pg ceph pg $pg query | jq .info.stats.acting_primary You can jq and less to take a look at other values which might be informative too. Ah, of course :-)  Sor

[ceph-users] Re: The effect of changing an osd's class

2024-11-15 Thread Gregory Orange
On 15/11/24 17:11, Roland Giesler wrote: > How do I determine the primary osd? ceph pg map $pg ceph pg $pg query | jq .info.stats.acting_primary You can jq and less to take a look at other values which might be informative too. Greg. ___ ceph-users ma

[ceph-users] Re: The effect of changing an osd's class

2024-11-15 Thread Roland Giesler
How do I determine the primary osd? On 2024/11/14 16:12, Anthony D'Atri wrote: You might also first try ceph osd down 1701 This marks the OSD down in the map, it doesn’t restart anything, but it does serve in some cases to goose progress. The OSD will quickly mark itself back

[ceph-users] Re: The effect of changing an osd's class

2024-11-14 Thread Anthony D'Atri
You might also first try ceph osd down 1701 This marks the OSD down in the map, it doesn’t restart anything, but it does serve in some cases to goose progress. The OSD will quickly mark itself back up. Where 1701 is the ID of said primary. ceph health detail

[ceph-users] Re: The effect of changing an osd's class

2024-11-14 Thread Roland Giesler
I redid what I did before.  I changed the osd class and had a look at the storage and found a VM image!  I deleted it (it was a copy) and now the storage is empty.  I guess the image (assigned to a non-existent storage after reverting left pg's that could not be moved.  I'm reverting now again

[ceph-users] Re: The effect of changing an osd's class

2024-11-14 Thread Frédéric Nass
Hi Roland, Yes, you can. See mclock documentation here [1]. One think I can think of is that these 113 PGs may have a common misbehaving OSD (primary or not) with a ridiculous osd_mclock_max_capacity_iops_ssd value set. Restarting the primary and/or adjusting osd_mclock_max_capacity_iops_ssd v

[ceph-users] Re: The effect of changing an osd's class

2024-11-14 Thread Roland Giesler
On 2024/11/14 11:44, Joachim Kraftmayer wrote: I know the similar behaviour when mclock is active. For osd.0 I see: osd.0 basic osd_mclock_max_capacity_iops_ssd 14305.161403 I'm unfamiliar with mclock.  Can one tune that to improve the situation? Roland Joachim joachim.kra

[ceph-users] Re: The effect of changing an osd's class

2024-11-14 Thread Joachim Kraftmayer
I know the similar behaviour when mclock is active. Joachim joachim.kraftma...@clyso.com www.clyso.com Hohenzollernstr. 27, 80801 Munich Utting a. A. | HR: Augsburg | HRB: 25866 | USt. ID-Nr.: DE2754306 Roland Giesler schrieb am Do., 14. Nov. 2024, 05:40: > On 2024/11/13 21:05, Anthony D'

[ceph-users] Re: The effect of changing an osd's class

2024-11-14 Thread Eugen Block
It's not clear to me if you wanted to add some more details after "I see this:" (twice). So you do see backfilling traffic if you out the OSD? Then maybe the remapped PGs are not even on that OSD? Have you checked 'ceph pg ls remapped'? To drain an OSD, you can either set it "out" as you alre

[ceph-users] Re: The effect of changing an osd's class

2024-11-14 Thread Roland Giesler
I had attached images, but these are not shown... On 2024/11/14 10:12, Roland Giesler wrote: On 2024/11/14 09:37, Eugen Block wrote: Remapped PGs is exactly what to expect after removing (or adding) a device class. Did you revert the change entirely? It sounds like you maybe forgot to add the

[ceph-users] Re: The effect of changing an osd's class

2024-11-14 Thread Roland Giesler
On 2024/11/14 09:37, Eugen Block wrote: Remapped PGs is exactly what to expect after removing (or adding) a device class. Did you revert the change entirely? It sounds like you maybe forgot to add the original device class back to the OSD where you changed it? Maybe share 'ceph osd tree'? Do yo

[ceph-users] Re: The effect of changing an osd's class

2024-11-13 Thread Eugen Block
Remapped PGs is exactly what to expect after removing (or adding) a device class. Did you revert the change entirely? It sounds like you maybe forgot to add the original device class back to the OSD where you changed it? Maybe share 'ceph osd tree'? Do you have recovery IO (ceph -s)? Does t

[ceph-users] Re: The effect of changing an osd's class

2024-11-13 Thread Roland Giesler
On 2024/11/13 21:05, Anthony D'Atri wrote: I would think that there was some initial data movement and that it all went back when you reverted. I would not expect a mess.   data:     volumes: 1/1 healthy     pools:   7 pools, 1586 pgs     objects: 5.79M objects, 12 TiB     usage:   24 TiB use

[ceph-users] Re: The effect of changing an osd's class

2024-11-13 Thread Anthony D'Atri
I would think that there was some initial data movement and that it all went back when you reverted. I would not expect a mess. > On Nov 13, 2024, at 12:48 PM, Roland Giesler wrote: > > I created a new osd class and changed the class of an osd to the new one > without taking the osd out and s