Hi Dan "Also, in the process of moving the hosts one by one, each step creates a new topology which can change the ordering of hosts, incrementally putting things out of whack."
RESPONSE: Will it be better to edit the crushmap as a file, and load the new with ceph osd setcrushmap -i <file> ? Kaspar: I assume the cluster was idle during your tests? RESPONSE: Yes, the cluster was indeed idle Also -- can you reproduce it without norecover/nobackfill set ? RESPONSE: Yes, I reproduced with no flags set - Starting point HEALTH_OK - Result is as seen here below - (The output from the ceph pg "$pgid" query is 8548 lines - I don't know how to send that - Just as an attachment?): ubuntu@ksr-ceph-deploy:~$ ceph osd crush move ksr-ceph-osd1 rack=rack1; moved item id -7 name 'ksr-ceph-osd1' to location {rack=rack1} in crush map ubuntu@ksr-ceph-deploy:~$ sleep 10 ubuntu@ksr-ceph-deploy:~$ ceph pg ls undersized; PG OBJECTS DEGRADED MISPLACED UNFOUND BYTES OMAP_BYTES* OMAP_KEYS* LOG LOG_DUPS STATE SINCE VERSION REPORTED UP ACTING SCRUB_STAMP DEEP_SCRUB_STAMP LAST_SCRUB_DURATION SCRUB_SCHEDULING 1.0 2 2 0 0 459280 0 0 117 0 active+recovery_wait+undersized+degraded+remapped 7s 732'117 963:994 [11,2,8]p11 [11,2]p11 2025-01-21T05:25:36.075052+0000 2025-01-17T14:16:37.883834+0000 1 periodic scrub scheduled @ 2025-01-22T15:03:39.660268+0000 6.0 22 60 0 0 2252800 0 0 124 0 active+recovery_wait+undersized+degraded+remapped 7s 732'124 962:2333 [7,6,1]p7 [7,1]p7 2025-01-21T09:06:38.302061+0000 2025-01-20T03:35:48.722520+0000 1 periodic scrub scheduled @ 2025-01-22T18:15:49.710411+0000 6.1 12 50 0 0 1228800 0 0 110 0 active+recovery_wait+undersized+degraded+remapped 6s 732'110 963:2298 [6,3,1]p6 [6,3]p6 2025-01-21T09:04:29.912825+0000 2025-01-20T03:11:56.962281+0000 1 periodic scrub scheduled @ 2025-01-22T14:09:59.472013+0000 6.2 13 52 52 0 5423104 0 0 107 0 active+recovery_wait+undersized+degraded+remapped 6s 732'107 963:2273 [10,11,4]p10 [0,4]p0 2025-01-21T06:46:11.657543+0000 2025-01-17T14:16:41.932263+0000 0 periodic scrub scheduled @ 2025-01-22T13:52:17.796513+0000 6.5 18 100 0 0 10027008 0 0 113 0 active+recovery_wait+undersized+degraded+remapped 6s 732'113 963:726 [0,9,3]p0 [0,3]p0 2025-01-21T05:24:25.006369+0000 2025-01-21T05:24:25.006369+0000 0 periodic scrub scheduled @ 2025-01-22T12:29:24.576297+0000 6.9 16 55 0 0 5730304 0 0 104 0 active+recovery_wait+undersized+degraded+remapped 6s 732'104 963:996 [10,9,1]p10 [10,1]p10 2025-01-21T09:02:25.957504+0000 2025-01-17T14:16:50.479422+0000 0 periodic scrub scheduled @ 2025-01-22T19:49:15.092705+0000 6.b 18 60 0 0 1843200 0 0 114 0 active+recovery_wait+undersized+degraded+remapped 6s 732'114 962:3860 [1,2,6]p1 [1,2]p1 2025-01-21T06:57:22.832565+0000 2025-01-17T14:16:57.820141+0000 0 periodic scrub scheduled @ 2025-01-22T09:03:26.636583+0000 6.f 17 60 0 0 5832704 0 0 117 0 active+recovery_wait+undersized+degraded+remapped 6s 732'117 963:1437 [4,3,2]p4 [4,3]p4 2025-01-21T07:59:55.049488+0000 2025-01-17T14:17:05.581667+0000 0 periodic scrub scheduled @ 2025-01-22T14:09:31.906176+0000 6.11 20 59 0 0 7888896 0 0 123 0 active+recovery_wait+undersized+degraded+remapped 7s 732'123 963:3329 [11,0,8]p11 [11,8]p11 2025-01-21T01:32:23.458956+0000 2025-01-17T14:17:09.774195+0000 0 periodic scrub scheduled @ 2025-01-22T06:25:32.462414+0000 6.12 21 4 42 0 10334208 0 0 141 0 active+recovering+undersized+degraded+remapped 1.11259s 732'141 964:613 [9,8,11]p9 [2,11]p2 2025-01-21T05:01:52.629884+0000 2025-01-21T05:01:52.629884+0000 0 periodic scrub scheduled @ 2025-01-22T07:57:03.899309+0000 6.13 22 138 0 0 8093696 0 0 156 0 active+recovery_wait+undersized+degraded+remapped 6s 732'156 963:1312 [3,6,9]p3 [3,6]p3 2025-01-21T04:47:23.091543+0000 2025-01-18T19:41:37.702881+0000 0 periodic scrub scheduled @ 2025-01-22T13:44:27.566620+0000 6.14 14 57 0 0 5525504 0 0 116 0 active+recovery_wait+undersized+degraded+remapped 6s 732'116 963:804 [11,9,8]p11 [11,9]p11 2025-01-21T04:58:22.800659+0000 2025-01-18T18:51:30.797784+0000 0 periodic scrub scheduled @ 2025-01-22T14:10:05.285157+0000 6.17 15 58 0 0 3284992 0 0 117 0 active+recovery_wait+undersized+degraded+remapped 7s 732'117 963:1758 [11,8,5]p11 [11,8]p11 2025-01-21T17:02:51.283098+0000 2025-01-17T14:17:24.300985+0000 3 periodic scrub scheduled @ 2025-01-23T02:23:35.432833+0000 6.1a 16 2 0 0 3387392 0 0 118 0 active+recovering+undersized+remapped 0.802018s 732'118 964:1263 [6,7,1]p6 [1,6]p1 2025-01-21T09:04:47.728667+0000 2025-01-18T21:05:47.129277+0000 1 periodic scrub scheduled @ 2025-01-22T16:41:28.828878+0000 6.1d 18 118 0 0 11776000 0 0 111 0 active+recovery_wait+undersized+degraded+remapped 6s 732'111 963:1239 [3,5,10]p3 [3,5]p3 2025-01-21T06:57:54.358465+0000 2025-01-17T14:17:34.523095+0000 0 periodic scrub scheduled @ 2025-01-22T10:05:29.250471+0000 * NOTE: Omap statistics are gathered during deep scrub and may be inaccurate soon afterwards depending on utilization. See http://docs.ceph.com/en/latest/dev/placement-group/#omap-statistics for further details. ubuntu@ksr-ceph-deploy:~$ sleep 5; ubuntu@ksr-ceph-deploy:~$ ceph pg ls degraded; PG OBJECTS DEGRADED MISPLACED UNFOUND BYTES OMAP_BYTES* OMAP_KEYS* LOG LOG_DUPS STATE SINCE VERSION REPORTED UP ACTING SCRUB_STAMP DEEP_SCRUB_STAMP LAST_SCRUB_DURATION SCRUB_SCHEDULING 1.0 2 2 0 0 459280 0 0 117 0 active+recovery_wait+undersized+degraded+remapped 12s 732'117 965:996 [11,2,8]p11 [11,2]p11 2025-01-21T05:25:36.075052+0000 2025-01-17T14:16:37.883834+0000 1 periodic scrub scheduled @ 2025-01-22T15:03:39.660268+0000 6.2 13 52 52 0 5423104 0 0 107 0 active+recovery_wait+undersized+degraded+remapped 12s 732'107 965:2275 [10,11,4]p10 [0,4]p0 2025-01-21T06:46:11.657543+0000 2025-01-17T14:16:41.932263+0000 0 periodic scrub scheduled @ 2025-01-22T13:52:17.796513+0000 6.5 18 100 0 0 10027008 0 0 113 0 active+recovery_wait+undersized+degraded+remapped 11s 732'113 965:728 [0,9,3]p0 [0,3]p0 2025-01-21T05:24:25.006369+0000 2025-01-21T05:24:25.006369+0000 0 periodic scrub scheduled @ 2025-01-22T12:29:24.576297+0000 6.7 19 2 0 0 9535488 0 0 100 0 active+recovering+degraded 2s 732'100 966:1371 [11,10,1]p11 [11,10,1]p11 2025-01-21T11:56:25.453326+0000 2025-01-17T14:16:46.382792+0000 0 periodic scrub scheduled @ 2025-01-22T21:24:43.245561+0000 6.9 16 55 0 0 5730304 0 0 104 0 active+recovery_wait+undersized+degraded+remapped 11s 732'104 965:998 [10,9,1]p10 [10,1]p10 2025-01-21T09:02:25.957504+0000 2025-01-17T14:16:50.479422+0000 0 periodic scrub scheduled @ 2025-01-22T19:49:15.092705+0000 6.b 18 60 0 0 1843200 0 0 114 0 active+recovery_wait+undersized+degraded+remapped 12s 732'114 965:3862 [1,2,6]p1 [1,2]p1 2025-01-21T06:57:22.832565+0000 2025-01-17T14:16:57.820141+0000 0 periodic scrub scheduled @ 2025-01-22T09:03:26.636583+0000 6.f 17 60 0 0 5832704 0 0 117 0 active+recovery_wait+undersized+degraded+remapped 12s 732'117 965:1439 [4,3,2]p4 [4,3]p4 2025-01-21T07:59:55.049488+0000 2025-01-17T14:17:05.581667+0000 0 periodic scrub scheduled @ 2025-01-22T14:09:31.906176+0000 6.11 20 59 0 0 7888896 0 0 123 0 active+recovery_wait+undersized+degraded+remapped 12s 732'123 965:3331 [11,0,8]p11 [11,8]p11 2025-01-21T01:32:23.458956+0000 2025-01-17T14:17:09.774195+0000 0 periodic scrub scheduled @ 2025-01-22T06:25:32.462414+0000 6.13 22 138 0 0 8093696 0 0 156 0 active+recovery_wait+undersized+degraded+remapped 11s 732'156 963:1312 [3,6,9]p3 [3,6]p3 2025-01-21T04:47:23.091543+0000 2025-01-18T19:41:37.702881+0000 0 periodic scrub scheduled @ 2025-01-22T13:44:27.566620+0000 6.14 14 57 0 0 5525504 0 0 116 0 active+recovery_wait+undersized+degraded+remapped 12s 732'116 965:806 [11,9,8]p11 [11,9]p11 2025-01-21T04:58:22.800659+0000 2025-01-18T18:51:30.797784+0000 0 periodic scrub scheduled @ 2025-01-22T14:10:05.285157+0000 6.17 15 58 0 0 3284992 0 0 117 0 active+recovery_wait+undersized+degraded+remapped 12s 732'117 965:1760 [11,8,5]p11 [11,8]p11 2025-01-21T17:02:51.283098+0000 2025-01-17T14:17:24.300985+0000 3 periodic scrub scheduled @ 2025-01-23T02:23:35.432833+0000 6.18 10 41 0 0 5115904 0 0 95 0 active+recovery_wait+degraded 12s 732'95 965:888 [4,10,5]p4 [4,10,5]p4 2025-01-21T07:22:43.040812+0000 2025-01-17T14:17:26.898595+0000 0 periodic scrub scheduled @ 2025-01-22T17:24:55.003668+0000 6.1c 9 86 0 0 5013504 0 0 88 0 active+recovery_wait+degraded 12s 732'88 963:24 [3,0,8]p3 [3,0,8]p3 2025-01-21T09:08:33.536101+0000 2025-01-17T14:17:37.489220+0000 1 periodic scrub scheduled @ 2025-01-22T11:35:58.254737+0000 6.1d 18 118 0 0 11776000 0 0 111 0 active+recovery_wait+undersized+degraded+remapped 11s 732'111 963:1239 [3,5,10]p3 [3,5]p3 2025-01-21T06:57:54.358465+0000 2025-01-17T14:17:34.523095+0000 0 periodic scrub scheduled @ 2025-01-22T10:05:29.250471+0000 * NOTE: Omap statistics are gathered during deep scrub and may be inaccurate soon afterwards depending on utilization. See http://docs.ceph.com/en/latest/dev/placement-group/#omap-statistics for further details. ________________________________ From: Dan van der Ster <dan.vanders...@clyso.com> Sent: Tuesday, January 21, 2025 16:51 To: Anthony D'Atri <a...@dreamsnake.net>; Kasper Rasmussen <kasper_steenga...@hotmail.com> Cc: ceph-users <ceph-users@ceph.io> Subject: Re: [ceph-users] Re: Changing crush map result in > 100% objects degraded On Tue, Jan 21, 2025 at 7:12 AM Anthony D'Atri <a...@dreamsnake.net> wrote: > > On Jan 21, 2025, at 7:59 AM, Kasper Rasmussen > > <kasper_steenga...@hotmail.com> wrote: > > > > 1 - Why do this result in such a high - objects degraded - percentage? > > I suspect that’s a function of the new topology having changed the mappings > of multiple OSDs for given PGs. It’s subtle, but when you move hosts into > rack CRUSH buckets, that’s a different set of inputs into the CRUSH hash > function, so the mappings that come out are different, even though you > haven’t changed the rules and would think that hosts are hosts. Also, in the process of moving the hosts one by one, each step creates a new topology which can change the ordering of hosts, incrementally putting things out of whack. > > 2 - Why do PGs get undersized? > > That often means that CRUSH can’t find a complete set of placements. In your > situation maybe those would resolve themselves when you unleash the recovery > hounds. We started noticing this kind of issue around pacific, but haven't fully tracked down what broke yet. See https://tracker.ceph.com/issues/56046 for similar. Undersized or degraded should only happen -- by design -- if objects were modified while the PG did not have 3 OSDs up and acting. Kaspar: I assume the cluster was idle during your tests? Also -- can you reproduce it without norecover/nobackfill set ? Could you simplify your reproducer down to: > HEALTH_OK > ceph osd crush move ksr-ceph-osd1 rack=rack1 > ceph pg ls undersized / degraded # get a pgid of a degraded PG > ceph pg $pgid query Cheers, dan -- Dan van der Ster CTO @ CLYSO Try our Ceph Analyzer -- https://analyzer.clyso.com/ https://clyso.com | dan.vanders...@clyso.com _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io