This is one reason to set nobackfill/norebalance first, so that the cluster doesn’t needlessly react to an intermediate state.
Having managed clusters before we had the ability to manipulate the CRUSH topology via the CLI, I would suggest using the CLI whenever possible. It’s all too easy to fat-finger a decompiled CRUSH map when editing via text editor. > On Jan 21, 2025, at 12:38 PM, Kasper Rasmussen > <kasper_steenga...@hotmail.com> wrote: > > Oh, but of cause everything smooths out after while. > > My main concern is just, if I do this on a large cluster, it will send it > spinning... > > From: Kasper Rasmussen <kasper_steenga...@hotmail.com > <mailto:kasper_steenga...@hotmail.com>> > Sent: Tuesday, January 21, 2025 18:35 > To: Dan van der Ster <dan.vanders...@clyso.com > <mailto:dan.vanders...@clyso.com>>; Anthony D'Atri <a...@dreamsnake.net > <mailto:a...@dreamsnake.net>> > Cc: ceph-users <ceph-users@ceph.io <mailto:ceph-users@ceph.io>> > Subject: Re: [ceph-users] Re: Changing crush map result in > 100% objects > degraded > > Hi Dan > > "Also, in the process of moving the hosts one by one, each step creates > a new topology which can change the ordering of hosts, incrementally > putting things out of whack." > > RESPONSE: Will it be better to edit the crushmap as a file, and load the new > with ceph osd setcrushmap -i <file> ? > > > > Kaspar: I assume the cluster was idle during your tests? > RESPONSE: Yes, the cluster was indeed idle > > Also -- can you reproduce it without norecover/nobackfill set ? > RESPONSE: Yes, I reproduced with no flags set - Starting point HEALTH_OK - > > Result is as seen here below - (The output from the ceph pg "$pgid" query is > 8548 lines - I don't know how to send that - Just as an attachment?): > > ubuntu@ksr-ceph-deploy:~$ ceph osd crush move ksr-ceph-osd1 rack=rack1; > moved item id -7 name 'ksr-ceph-osd1' to location {rack=rack1} in crush map > ubuntu@ksr-ceph-deploy:~$ sleep 10 > ubuntu@ksr-ceph-deploy:~$ ceph pg ls undersized; > PG OBJECTS DEGRADED MISPLACED UNFOUND BYTES OMAP_BYTES* OMAP_KEYS* LOG > LOG_DUPS STATE SINCE VERSION REPORTED UP ACTING SCRUB_STAMP DEEP_SCRUB_STAMP > LAST_SCRUB_DURATION SCRUB_SCHEDULING > 1.0 2 2 0 0 459280 0 0 117 0 > active+recovery_wait+undersized+degraded+remapped 7s 732'117 963:994 > [11,2,8]p11 [11,2]p11 2025-01-21T05:25:36.075052+0000 > 2025-01-17T14:16:37.883834+0000 1 periodic scrub scheduled @ > 2025-01-22T15:03:39.660268+0000 > 6.0 22 60 0 0 2252800 0 0 124 0 > active+recovery_wait+undersized+degraded+remapped 7s 732'124 962:2333 > [7,6,1]p7 [7,1]p7 2025-01-21T09:06:38.302061+0000 > 2025-01-20T03:35:48.722520+0000 1 periodic scrub scheduled @ > 2025-01-22T18:15:49.710411+0000 > 6.1 12 50 0 0 1228800 0 0 110 0 > active+recovery_wait+undersized+degraded+remapped 6s 732'110 963:2298 > [6,3,1]p6 [6,3]p6 2025-01-21T09:04:29.912825+0000 > 2025-01-20T03:11:56.962281+0000 1 periodic scrub scheduled @ > 2025-01-22T14:09:59.472013+0000 > 6.2 13 52 52 0 5423104 0 0 107 0 > active+recovery_wait+undersized+degraded+remapped 6s 732'107 963:2273 > [10,11,4]p10 [0,4]p0 2025-01-21T06:46:11.657543+0000 > 2025-01-17T14:16:41.932263+0000 0 periodic scrub scheduled @ > 2025-01-22T13:52:17.796513+0000 > 6.5 18 100 0 0 10027008 0 0 113 0 > active+recovery_wait+undersized+degraded+remapped 6s 732'113 963:726 > [0,9,3]p0 [0,3]p0 2025-01-21T05:24:25.006369+0000 > 2025-01-21T05:24:25.006369+0000 0 periodic scrub scheduled @ > 2025-01-22T12:29:24.576297+0000 > 6.9 16 55 0 0 5730304 0 0 104 0 > active+recovery_wait+undersized+degraded+remapped 6s 732'104 963:996 > [10,9,1]p10 [10,1]p10 2025-01-21T09:02:25.957504+0000 > 2025-01-17T14:16:50.479422+0000 0 periodic scrub scheduled @ > 2025-01-22T19:49:15.092705+0000 > 6.b 18 60 0 0 1843200 0 0 114 0 > active+recovery_wait+undersized+degraded+remapped 6s 732'114 962:3860 > [1,2,6]p1 [1,2]p1 2025-01-21T06:57:22.832565+0000 > 2025-01-17T14:16:57.820141+0000 0 periodic scrub scheduled @ > 2025-01-22T09:03:26.636583+0000 > 6.f 17 60 0 0 5832704 0 0 117 0 > active+recovery_wait+undersized+degraded+remapped 6s 732'117 963:1437 > [4,3,2]p4 [4,3]p4 2025-01-21T07:59:55.049488+0000 > 2025-01-17T14:17:05.581667+0000 0 periodic scrub scheduled @ > 2025-01-22T14:09:31.906176+0000 > 6.11 20 59 0 0 7888896 0 0 123 0 > active+recovery_wait+undersized+degraded+remapped 7s 732'123 963:3329 > [11,0,8]p11 [11,8]p11 2025-01-21T01:32:23.458956+0000 > 2025-01-17T14:17:09.774195+0000 0 periodic scrub scheduled @ > 2025-01-22T06:25:32.462414+0000 > 6.12 21 4 42 0 10334208 0 0 141 0 > active+recovering+undersized+degraded+remapped 1.11259s 732'141 964:613 > [9,8,11]p9 [2,11]p2 2025-01-21T05:01:52.629884+0000 > 2025-01-21T05:01:52.629884+0000 0 periodic scrub scheduled @ > 2025-01-22T07:57:03.899309+0000 > 6.13 22 138 0 0 8093696 0 0 156 0 > active+recovery_wait+undersized+degraded+remapped 6s 732'156 963:1312 > [3,6,9]p3 [3,6]p3 2025-01-21T04:47:23.091543+0000 > 2025-01-18T19:41:37.702881+0000 0 periodic scrub scheduled @ > 2025-01-22T13:44:27.566620+0000 > 6.14 14 57 0 0 5525504 0 0 116 0 > active+recovery_wait+undersized+degraded+remapped 6s 732'116 963:804 > [11,9,8]p11 [11,9]p11 2025-01-21T04:58:22.800659+0000 > 2025-01-18T18:51:30.797784+0000 0 periodic scrub scheduled @ > 2025-01-22T14:10:05.285157+0000 > 6.17 15 58 0 0 3284992 0 0 117 0 > active+recovery_wait+undersized+degraded+remapped 7s 732'117 963:1758 > [11,8,5]p11 [11,8]p11 2025-01-21T17:02:51.283098+0000 > 2025-01-17T14:17:24.300985+0000 3 periodic scrub scheduled @ > 2025-01-23T02:23:35.432833+0000 > 6.1a 16 2 0 0 3387392 0 0 118 0 active+recovering+undersized+remapped > 0.802018s 732'118 964:1263 [6,7,1]p6 [1,6]p1 2025-01-21T09:04:47.728667+0000 > 2025-01-18T21:05:47.129277+0000 1 periodic scrub scheduled @ > 2025-01-22T16:41:28.828878+0000 > 6.1d 18 118 0 0 11776000 0 0 111 0 > active+recovery_wait+undersized+degraded+remapped 6s 732'111 963:1239 > [3,5,10]p3 [3,5]p3 2025-01-21T06:57:54.358465+0000 > 2025-01-17T14:17:34.523095+0000 0 periodic scrub scheduled @ > 2025-01-22T10:05:29.250471+0000 > > * NOTE: Omap statistics are gathered during deep scrub and may be inaccurate > soon afterwards depending on utilization. See > http://docs.ceph.com/en/latest/dev/placement-group/#omap-statistics for > further details. > ubuntu@ksr-ceph-deploy:~$ sleep 5; > ubuntu@ksr-ceph-deploy:~$ ceph pg ls degraded; > PG OBJECTS DEGRADED MISPLACED UNFOUND BYTES OMAP_BYTES* OMAP_KEYS* LOG > LOG_DUPS STATE SINCE VERSION REPORTED UP ACTING SCRUB_STAMP DEEP_SCRUB_STAMP > LAST_SCRUB_DURATION SCRUB_SCHEDULING > 1.0 2 2 0 0 459280 0 0 117 0 > active+recovery_wait+undersized+degraded+remapped 12s 732'117 965:996 > [11,2,8]p11 [11,2]p11 2025-01-21T05:25:36.075052+0000 > 2025-01-17T14:16:37.883834+0000 1 periodic scrub scheduled @ > 2025-01-22T15:03:39.660268+0000 > 6.2 13 52 52 0 5423104 0 0 107 0 > active+recovery_wait+undersized+degraded+remapped 12s 732'107 965:2275 > [10,11,4]p10 [0,4]p0 2025-01-21T06:46:11.657543+0000 > 2025-01-17T14:16:41.932263+0000 0 periodic scrub scheduled @ > 2025-01-22T13:52:17.796513+0000 > 6.5 18 100 0 0 10027008 0 0 113 0 > active+recovery_wait+undersized+degraded+remapped 11s 732'113 965:728 > [0,9,3]p0 [0,3]p0 2025-01-21T05:24:25.006369+0000 > 2025-01-21T05:24:25.006369+0000 0 periodic scrub scheduled @ > 2025-01-22T12:29:24.576297+0000 > 6.7 19 2 0 0 9535488 0 0 100 0 active+recovering+degraded 2s 732'100 966:1371 > [11,10,1]p11 [11,10,1]p11 2025-01-21T11:56:25.453326+0000 > 2025-01-17T14:16:46.382792+0000 0 periodic scrub scheduled @ > 2025-01-22T21:24:43.245561+0000 > 6.9 16 55 0 0 5730304 0 0 104 0 > active+recovery_wait+undersized+degraded+remapped 11s 732'104 965:998 > [10,9,1]p10 [10,1]p10 2025-01-21T09:02:25.957504+0000 > 2025-01-17T14:16:50.479422+0000 0 periodic scrub scheduled @ > 2025-01-22T19:49:15.092705+0000 > 6.b 18 60 0 0 1843200 0 0 114 0 > active+recovery_wait+undersized+degraded+remapped 12s 732'114 965:3862 > [1,2,6]p1 [1,2]p1 2025-01-21T06:57:22.832565+0000 > 2025-01-17T14:16:57.820141+0000 0 periodic scrub scheduled @ > 2025-01-22T09:03:26.636583+0000 > 6.f 17 60 0 0 5832704 0 0 117 0 > active+recovery_wait+undersized+degraded+remapped 12s 732'117 965:1439 > [4,3,2]p4 [4,3]p4 2025-01-21T07:59:55.049488+0000 > 2025-01-17T14:17:05.581667+0000 0 periodic scrub scheduled @ > 2025-01-22T14:09:31.906176+0000 > 6.11 20 59 0 0 7888896 0 0 123 0 > active+recovery_wait+undersized+degraded+remapped 12s 732'123 965:3331 > [11,0,8]p11 [11,8]p11 2025-01-21T01:32:23.458956+0000 > 2025-01-17T14:17:09.774195+0000 0 periodic scrub scheduled @ > 2025-01-22T06:25:32.462414+0000 > 6.13 22 138 0 0 8093696 0 0 156 0 > active+recovery_wait+undersized+degraded+remapped 11s 732'156 963:1312 > [3,6,9]p3 [3,6]p3 2025-01-21T04:47:23.091543+0000 > 2025-01-18T19:41:37.702881+0000 0 periodic scrub scheduled @ > 2025-01-22T13:44:27.566620+0000 > 6.14 14 57 0 0 5525504 0 0 116 0 > active+recovery_wait+undersized+degraded+remapped 12s 732'116 965:806 > [11,9,8]p11 [11,9]p11 2025-01-21T04:58:22.800659+0000 > 2025-01-18T18:51:30.797784+0000 0 periodic scrub scheduled @ > 2025-01-22T14:10:05.285157+0000 > 6.17 15 58 0 0 3284992 0 0 117 0 > active+recovery_wait+undersized+degraded+remapped 12s 732'117 965:1760 > [11,8,5]p11 [11,8]p11 2025-01-21T17:02:51.283098+0000 > 2025-01-17T14:17:24.300985+0000 3 periodic scrub scheduled @ > 2025-01-23T02:23:35.432833+0000 > 6.18 10 41 0 0 5115904 0 0 95 0 active+recovery_wait+degraded 12s 732'95 > 965:888 [4,10,5]p4 [4,10,5]p4 2025-01-21T07:22:43.040812+0000 > 2025-01-17T14:17:26.898595+0000 0 periodic scrub scheduled @ > 2025-01-22T17:24:55.003668+0000 > 6.1c 9 86 0 0 5013504 0 0 88 0 active+recovery_wait+degraded 12s 732'88 > 963:24 [3,0,8]p3 [3,0,8]p3 2025-01-21T09:08:33.536101+0000 > 2025-01-17T14:17:37.489220+0000 1 periodic scrub scheduled @ > 2025-01-22T11:35:58.254737+0000 > 6.1d 18 118 0 0 11776000 0 0 111 0 > active+recovery_wait+undersized+degraded+remapped 11s 732'111 963:1239 > [3,5,10]p3 [3,5]p3 2025-01-21T06:57:54.358465+0000 > 2025-01-17T14:17:34.523095+0000 0 periodic scrub scheduled @ > 2025-01-22T10:05:29.250471+0000 > > * NOTE: Omap statistics are gathered during deep scrub and may be inaccurate > soon afterwards depending on utilization. See > http://docs.ceph.com/en/latest/dev/placement-group/#omap-statistics for > further details. > > > From: Dan van der Ster <dan.vanders...@clyso.com > <mailto:dan.vanders...@clyso.com>> > Sent: Tuesday, January 21, 2025 16:51 > To: Anthony D'Atri <a...@dreamsnake.net <mailto:a...@dreamsnake.net>>; Kasper > Rasmussen <kasper_steenga...@hotmail.com > <mailto:kasper_steenga...@hotmail.com>> > Cc: ceph-users <ceph-users@ceph.io <mailto:ceph-users@ceph.io>> > Subject: Re: [ceph-users] Re: Changing crush map result in > 100% objects > degraded > > On Tue, Jan 21, 2025 at 7:12 AM Anthony D'Atri <a...@dreamsnake.net > <mailto:a...@dreamsnake.net>> wrote: > > > On Jan 21, 2025, at 7:59 AM, Kasper Rasmussen > > > <kasper_steenga...@hotmail.com <mailto:kasper_steenga...@hotmail.com>> > > > wrote: > > > > > > 1 - Why do this result in such a high - objects degraded - percentage? > > > > I suspect that’s a function of the new topology having changed the mappings > > of multiple OSDs for given PGs. It’s subtle, but when you move hosts into > > rack CRUSH buckets, that’s a different set of inputs into the CRUSH hash > > function, so the mappings that come out are different, even though you > > haven’t changed the rules and would think that hosts are hosts. > > Also, in the process of moving the hosts one by one, each step creates > a new topology which can change the ordering of hosts, incrementally > putting things out of whack. > > > > 2 - Why do PGs get undersized? > > > > That often means that CRUSH can’t find a complete set of placements. In > > your situation maybe those would resolve themselves when you unleash the > > recovery hounds. > > We started noticing this kind of issue around pacific, but haven't > fully tracked down what broke yet. > See https://tracker.ceph.com/issues/56046 for similar. > > Undersized or degraded should only happen -- by design -- if objects > were modified while the PG did not have 3 OSDs up and acting. > Kaspar: I assume the cluster was idle during your tests? > Also -- can you reproduce it without norecover/nobackfill set ? > > Could you simplify your reproducer down to: > > > HEALTH_OK > > ceph osd crush move ksr-ceph-osd1 rack=rack1 > > ceph pg ls undersized / degraded # get a pgid of a degraded PG > > ceph pg $pgid query > > Cheers, dan > > > -- > Dan van der Ster > CTO @ CLYSO > Try our Ceph Analyzer -- https://analyzer.clyso.com/ > https://clyso.com <https://clyso.com/> | dan.vanders...@clyso.com > <mailto:dan.vanders...@clyso.com> _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io