This is one reason to set nobackfill/norebalance first, so that the cluster 
doesn’t needlessly react to an intermediate state.

Having managed clusters before we had the ability to manipulate the CRUSH 
topology via the CLI, I would suggest using the CLI whenever possible.  It’s 
all too easy to fat-finger a decompiled CRUSH map when editing via text editor.

> On Jan 21, 2025, at 12:38 PM, Kasper Rasmussen 
> <kasper_steenga...@hotmail.com> wrote:
> 
> Oh, but of cause everything smooths out after while.
> 
> My main concern is just, if I do this on a large cluster, it will send it 
> spinning...
> 
> From: Kasper Rasmussen <kasper_steenga...@hotmail.com 
> <mailto:kasper_steenga...@hotmail.com>>
> Sent: Tuesday, January 21, 2025 18:35
> To: Dan van der Ster <dan.vanders...@clyso.com 
> <mailto:dan.vanders...@clyso.com>>; Anthony D'Atri <a...@dreamsnake.net 
> <mailto:a...@dreamsnake.net>>
> Cc: ceph-users <ceph-users@ceph.io <mailto:ceph-users@ceph.io>>
> Subject: Re: [ceph-users] Re: Changing crush map result in > 100% objects 
> degraded
>  
> Hi Dan
> 
> "Also, in the process of moving the hosts one by one, each step creates
> a new topology which can change the ordering of hosts, incrementally
> putting things out of whack."
> 
> RESPONSE: Will it be better to edit the crushmap as a file, and load the new 
> with ceph osd setcrushmap -i <file> ?
> 
> 
> 
> Kaspar: I assume the cluster was idle during your tests?
> RESPONSE: Yes, the cluster was indeed idle
> 
> Also -- can you reproduce it without norecover/nobackfill set ?
> RESPONSE: Yes, I reproduced with no flags set - Starting point HEALTH_OK -
> 
> Result is as seen here below - (The output from the ceph pg "$pgid" query is 
> 8548 lines - I don't know how to send that -  Just as an attachment?):
> 
> ubuntu@ksr-ceph-deploy:~$ ceph osd crush move ksr-ceph-osd1 rack=rack1;
> moved item id -7 name 'ksr-ceph-osd1' to location {rack=rack1} in crush map
> ubuntu@ksr-ceph-deploy:~$ sleep 10
> ubuntu@ksr-ceph-deploy:~$ ceph pg ls undersized;
> PG OBJECTS DEGRADED MISPLACED UNFOUND BYTES OMAP_BYTES* OMAP_KEYS* LOG 
> LOG_DUPS STATE SINCE VERSION REPORTED UP ACTING SCRUB_STAMP DEEP_SCRUB_STAMP 
> LAST_SCRUB_DURATION SCRUB_SCHEDULING
> 1.0 2 2 0 0 459280 0 0 117 0 
> active+recovery_wait+undersized+degraded+remapped 7s 732'117 963:994 
> [11,2,8]p11 [11,2]p11 2025-01-21T05:25:36.075052+0000 
> 2025-01-17T14:16:37.883834+0000 1 periodic scrub scheduled @ 
> 2025-01-22T15:03:39.660268+0000
> 6.0 22 60 0 0 2252800 0 0 124 0 
> active+recovery_wait+undersized+degraded+remapped 7s 732'124 962:2333 
> [7,6,1]p7 [7,1]p7 2025-01-21T09:06:38.302061+0000 
> 2025-01-20T03:35:48.722520+0000 1 periodic scrub scheduled @ 
> 2025-01-22T18:15:49.710411+0000
> 6.1 12 50 0 0 1228800 0 0 110 0 
> active+recovery_wait+undersized+degraded+remapped 6s 732'110 963:2298 
> [6,3,1]p6 [6,3]p6 2025-01-21T09:04:29.912825+0000 
> 2025-01-20T03:11:56.962281+0000 1 periodic scrub scheduled @ 
> 2025-01-22T14:09:59.472013+0000
> 6.2 13 52 52 0 5423104 0 0 107 0 
> active+recovery_wait+undersized+degraded+remapped 6s 732'107 963:2273 
> [10,11,4]p10 [0,4]p0 2025-01-21T06:46:11.657543+0000 
> 2025-01-17T14:16:41.932263+0000 0 periodic scrub scheduled @ 
> 2025-01-22T13:52:17.796513+0000
> 6.5 18 100 0 0 10027008 0 0 113 0 
> active+recovery_wait+undersized+degraded+remapped 6s 732'113 963:726 
> [0,9,3]p0 [0,3]p0 2025-01-21T05:24:25.006369+0000 
> 2025-01-21T05:24:25.006369+0000 0 periodic scrub scheduled @ 
> 2025-01-22T12:29:24.576297+0000
> 6.9 16 55 0 0 5730304 0 0 104 0 
> active+recovery_wait+undersized+degraded+remapped 6s 732'104 963:996 
> [10,9,1]p10 [10,1]p10 2025-01-21T09:02:25.957504+0000 
> 2025-01-17T14:16:50.479422+0000 0 periodic scrub scheduled @ 
> 2025-01-22T19:49:15.092705+0000
> 6.b 18 60 0 0 1843200 0 0 114 0 
> active+recovery_wait+undersized+degraded+remapped 6s 732'114 962:3860 
> [1,2,6]p1 [1,2]p1 2025-01-21T06:57:22.832565+0000 
> 2025-01-17T14:16:57.820141+0000 0 periodic scrub scheduled @ 
> 2025-01-22T09:03:26.636583+0000
> 6.f 17 60 0 0 5832704 0 0 117 0 
> active+recovery_wait+undersized+degraded+remapped 6s 732'117 963:1437 
> [4,3,2]p4 [4,3]p4 2025-01-21T07:59:55.049488+0000 
> 2025-01-17T14:17:05.581667+0000 0 periodic scrub scheduled @ 
> 2025-01-22T14:09:31.906176+0000
> 6.11 20 59 0 0 7888896 0 0 123 0 
> active+recovery_wait+undersized+degraded+remapped 7s 732'123 963:3329 
> [11,0,8]p11 [11,8]p11 2025-01-21T01:32:23.458956+0000 
> 2025-01-17T14:17:09.774195+0000 0 periodic scrub scheduled @ 
> 2025-01-22T06:25:32.462414+0000
> 6.12 21 4 42 0 10334208 0 0 141 0 
> active+recovering+undersized+degraded+remapped 1.11259s 732'141 964:613 
> [9,8,11]p9 [2,11]p2 2025-01-21T05:01:52.629884+0000 
> 2025-01-21T05:01:52.629884+0000 0 periodic scrub scheduled @ 
> 2025-01-22T07:57:03.899309+0000
> 6.13 22 138 0 0 8093696 0 0 156 0 
> active+recovery_wait+undersized+degraded+remapped 6s 732'156 963:1312 
> [3,6,9]p3 [3,6]p3 2025-01-21T04:47:23.091543+0000 
> 2025-01-18T19:41:37.702881+0000 0 periodic scrub scheduled @ 
> 2025-01-22T13:44:27.566620+0000
> 6.14 14 57 0 0 5525504 0 0 116 0 
> active+recovery_wait+undersized+degraded+remapped 6s 732'116 963:804 
> [11,9,8]p11 [11,9]p11 2025-01-21T04:58:22.800659+0000 
> 2025-01-18T18:51:30.797784+0000 0 periodic scrub scheduled @ 
> 2025-01-22T14:10:05.285157+0000
> 6.17 15 58 0 0 3284992 0 0 117 0 
> active+recovery_wait+undersized+degraded+remapped 7s 732'117 963:1758 
> [11,8,5]p11 [11,8]p11 2025-01-21T17:02:51.283098+0000 
> 2025-01-17T14:17:24.300985+0000 3 periodic scrub scheduled @ 
> 2025-01-23T02:23:35.432833+0000
> 6.1a 16 2 0 0 3387392 0 0 118 0 active+recovering+undersized+remapped 
> 0.802018s 732'118 964:1263 [6,7,1]p6 [1,6]p1 2025-01-21T09:04:47.728667+0000 
> 2025-01-18T21:05:47.129277+0000 1 periodic scrub scheduled @ 
> 2025-01-22T16:41:28.828878+0000
> 6.1d 18 118 0 0 11776000 0 0 111 0 
> active+recovery_wait+undersized+degraded+remapped 6s 732'111 963:1239 
> [3,5,10]p3 [3,5]p3 2025-01-21T06:57:54.358465+0000 
> 2025-01-17T14:17:34.523095+0000 0 periodic scrub scheduled @ 
> 2025-01-22T10:05:29.250471+0000
> 
> * NOTE: Omap statistics are gathered during deep scrub and may be inaccurate 
> soon afterwards depending on utilization. See 
> http://docs.ceph.com/en/latest/dev/placement-group/#omap-statistics for 
> further details.
> ubuntu@ksr-ceph-deploy:~$ sleep 5;
> ubuntu@ksr-ceph-deploy:~$ ceph pg ls degraded;
> PG OBJECTS DEGRADED MISPLACED UNFOUND BYTES OMAP_BYTES* OMAP_KEYS* LOG 
> LOG_DUPS STATE SINCE VERSION REPORTED UP ACTING SCRUB_STAMP DEEP_SCRUB_STAMP 
> LAST_SCRUB_DURATION SCRUB_SCHEDULING
> 1.0 2 2 0 0 459280 0 0 117 0 
> active+recovery_wait+undersized+degraded+remapped 12s 732'117 965:996 
> [11,2,8]p11 [11,2]p11 2025-01-21T05:25:36.075052+0000 
> 2025-01-17T14:16:37.883834+0000 1 periodic scrub scheduled @ 
> 2025-01-22T15:03:39.660268+0000
> 6.2 13 52 52 0 5423104 0 0 107 0 
> active+recovery_wait+undersized+degraded+remapped 12s 732'107 965:2275 
> [10,11,4]p10 [0,4]p0 2025-01-21T06:46:11.657543+0000 
> 2025-01-17T14:16:41.932263+0000 0 periodic scrub scheduled @ 
> 2025-01-22T13:52:17.796513+0000
> 6.5 18 100 0 0 10027008 0 0 113 0 
> active+recovery_wait+undersized+degraded+remapped 11s 732'113 965:728 
> [0,9,3]p0 [0,3]p0 2025-01-21T05:24:25.006369+0000 
> 2025-01-21T05:24:25.006369+0000 0 periodic scrub scheduled @ 
> 2025-01-22T12:29:24.576297+0000
> 6.7 19 2 0 0 9535488 0 0 100 0 active+recovering+degraded 2s 732'100 966:1371 
> [11,10,1]p11 [11,10,1]p11 2025-01-21T11:56:25.453326+0000 
> 2025-01-17T14:16:46.382792+0000 0 periodic scrub scheduled @ 
> 2025-01-22T21:24:43.245561+0000
> 6.9 16 55 0 0 5730304 0 0 104 0 
> active+recovery_wait+undersized+degraded+remapped 11s 732'104 965:998 
> [10,9,1]p10 [10,1]p10 2025-01-21T09:02:25.957504+0000 
> 2025-01-17T14:16:50.479422+0000 0 periodic scrub scheduled @ 
> 2025-01-22T19:49:15.092705+0000
> 6.b 18 60 0 0 1843200 0 0 114 0 
> active+recovery_wait+undersized+degraded+remapped 12s 732'114 965:3862 
> [1,2,6]p1 [1,2]p1 2025-01-21T06:57:22.832565+0000 
> 2025-01-17T14:16:57.820141+0000 0 periodic scrub scheduled @ 
> 2025-01-22T09:03:26.636583+0000
> 6.f 17 60 0 0 5832704 0 0 117 0 
> active+recovery_wait+undersized+degraded+remapped 12s 732'117 965:1439 
> [4,3,2]p4 [4,3]p4 2025-01-21T07:59:55.049488+0000 
> 2025-01-17T14:17:05.581667+0000 0 periodic scrub scheduled @ 
> 2025-01-22T14:09:31.906176+0000
> 6.11 20 59 0 0 7888896 0 0 123 0 
> active+recovery_wait+undersized+degraded+remapped 12s 732'123 965:3331 
> [11,0,8]p11 [11,8]p11 2025-01-21T01:32:23.458956+0000 
> 2025-01-17T14:17:09.774195+0000 0 periodic scrub scheduled @ 
> 2025-01-22T06:25:32.462414+0000
> 6.13 22 138 0 0 8093696 0 0 156 0 
> active+recovery_wait+undersized+degraded+remapped 11s 732'156 963:1312 
> [3,6,9]p3 [3,6]p3 2025-01-21T04:47:23.091543+0000 
> 2025-01-18T19:41:37.702881+0000 0 periodic scrub scheduled @ 
> 2025-01-22T13:44:27.566620+0000
> 6.14 14 57 0 0 5525504 0 0 116 0 
> active+recovery_wait+undersized+degraded+remapped 12s 732'116 965:806 
> [11,9,8]p11 [11,9]p11 2025-01-21T04:58:22.800659+0000 
> 2025-01-18T18:51:30.797784+0000 0 periodic scrub scheduled @ 
> 2025-01-22T14:10:05.285157+0000
> 6.17 15 58 0 0 3284992 0 0 117 0 
> active+recovery_wait+undersized+degraded+remapped 12s 732'117 965:1760 
> [11,8,5]p11 [11,8]p11 2025-01-21T17:02:51.283098+0000 
> 2025-01-17T14:17:24.300985+0000 3 periodic scrub scheduled @ 
> 2025-01-23T02:23:35.432833+0000
> 6.18 10 41 0 0 5115904 0 0 95 0 active+recovery_wait+degraded 12s 732'95 
> 965:888 [4,10,5]p4 [4,10,5]p4 2025-01-21T07:22:43.040812+0000 
> 2025-01-17T14:17:26.898595+0000 0 periodic scrub scheduled @ 
> 2025-01-22T17:24:55.003668+0000
> 6.1c 9 86 0 0 5013504 0 0 88 0 active+recovery_wait+degraded 12s 732'88 
> 963:24 [3,0,8]p3 [3,0,8]p3 2025-01-21T09:08:33.536101+0000 
> 2025-01-17T14:17:37.489220+0000 1 periodic scrub scheduled @ 
> 2025-01-22T11:35:58.254737+0000
> 6.1d 18 118 0 0 11776000 0 0 111 0 
> active+recovery_wait+undersized+degraded+remapped 11s 732'111 963:1239 
> [3,5,10]p3 [3,5]p3 2025-01-21T06:57:54.358465+0000 
> 2025-01-17T14:17:34.523095+0000 0 periodic scrub scheduled @ 
> 2025-01-22T10:05:29.250471+0000
> 
> * NOTE: Omap statistics are gathered during deep scrub and may be inaccurate 
> soon afterwards depending on utilization. See 
> http://docs.ceph.com/en/latest/dev/placement-group/#omap-statistics for 
> further details.
> 
> 
> From: Dan van der Ster <dan.vanders...@clyso.com 
> <mailto:dan.vanders...@clyso.com>>
> Sent: Tuesday, January 21, 2025 16:51
> To: Anthony D'Atri <a...@dreamsnake.net <mailto:a...@dreamsnake.net>>; Kasper 
> Rasmussen <kasper_steenga...@hotmail.com 
> <mailto:kasper_steenga...@hotmail.com>>
> Cc: ceph-users <ceph-users@ceph.io <mailto:ceph-users@ceph.io>>
> Subject: Re: [ceph-users] Re: Changing crush map result in > 100% objects 
> degraded
>  
> On Tue, Jan 21, 2025 at 7:12 AM Anthony D'Atri <a...@dreamsnake.net 
> <mailto:a...@dreamsnake.net>> wrote:
> > > On Jan 21, 2025, at 7:59 AM, Kasper Rasmussen 
> > > <kasper_steenga...@hotmail.com <mailto:kasper_steenga...@hotmail.com>> 
> > > wrote:
> > >
> > > 1 - Why do this result in such a high - objects degraded - percentage?
> >
> > I suspect that’s a function of the new topology having changed the mappings 
> > of multiple OSDs for given PGs.  It’s subtle, but when you move hosts into 
> > rack CRUSH buckets, that’s a different set of inputs into the CRUSH hash 
> > function, so the mappings that come out are different, even though you 
> > haven’t changed the rules and would think that hosts are hosts.
> 
> Also, in the process of moving the hosts one by one, each step creates
> a new topology which can change the ordering of hosts, incrementally
> putting things out of whack.
> 
> > > 2 - Why do PGs get undersized?
> >
> > That often means that CRUSH can’t find a complete set of placements.  In 
> > your situation maybe those would resolve themselves when you unleash the 
> > recovery hounds.
> 
> We started noticing this kind of issue around pacific, but haven't
> fully tracked down what broke yet.
> See https://tracker.ceph.com/issues/56046 for similar.
> 
> Undersized or degraded should only happen -- by design -- if objects
> were modified while the PG did not have 3 OSDs up and acting.
> Kaspar: I assume the cluster was idle during your tests?
> Also -- can you reproduce it without norecover/nobackfill set ?
> 
> Could you simplify your reproducer down to:
> 
> > HEALTH_OK
> > ceph osd crush move ksr-ceph-osd1 rack=rack1
> > ceph pg ls undersized / degraded # get a pgid of a degraded PG
> > ceph pg $pgid query
> 
> Cheers, dan
> 
> 
> --
> Dan van der Ster
> CTO @ CLYSO
> Try our Ceph Analyzer -- https://analyzer.clyso.com/
> https://clyso.com <https://clyso.com/> | dan.vanders...@clyso.com 
> <mailto:dan.vanders...@clyso.com>
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to