Hi Dan

"Also, in the process of moving the hosts one by one, each step creates
a new topology which can change the ordering of hosts, incrementally
putting things out of whack."

RESPONSE: Will it be better to edit the crushmap as a file, and load the new 
with ceph osd setcrushmap -i <file> ?



Kaspar: I assume the cluster was idle during your tests?
RESPONSE: Yes, the cluster was indeed idle

Also -- can you reproduce it without norecover/nobackfill set ?
RESPONSE: Yes, I reproduced with no flags set - Starting point HEALTH_OK -

Result is as seen here below - (The output from the ceph pg "$pgid" query is 
8548 lines - I don't know how to send that -  Just as an attachment?):

ubuntu@ksr-ceph-deploy:~$ ceph osd crush move ksr-ceph-osd1 rack=rack1;
moved item id -7 name 'ksr-ceph-osd1' to location {rack=rack1} in crush map
ubuntu@ksr-ceph-deploy:~$ sleep 10
ubuntu@ksr-ceph-deploy:~$ ceph pg ls undersized;
PG OBJECTS DEGRADED MISPLACED UNFOUND BYTES OMAP_BYTES* OMAP_KEYS* LOG LOG_DUPS 
STATE SINCE VERSION REPORTED UP ACTING SCRUB_STAMP DEEP_SCRUB_STAMP 
LAST_SCRUB_DURATION SCRUB_SCHEDULING
1.0 2 2 0 0 459280 0 0 117 0 active+recovery_wait+undersized+degraded+remapped 
7s 732'117 963:994 [11,2,8]p11 [11,2]p11 2025-01-21T05:25:36.075052+0000 
2025-01-17T14:16:37.883834+0000 1 periodic scrub scheduled @ 
2025-01-22T15:03:39.660268+0000
6.0 22 60 0 0 2252800 0 0 124 0 
active+recovery_wait+undersized+degraded+remapped 7s 732'124 962:2333 [7,6,1]p7 
[7,1]p7 2025-01-21T09:06:38.302061+0000 2025-01-20T03:35:48.722520+0000 1 
periodic scrub scheduled @ 2025-01-22T18:15:49.710411+0000
6.1 12 50 0 0 1228800 0 0 110 0 
active+recovery_wait+undersized+degraded+remapped 6s 732'110 963:2298 [6,3,1]p6 
[6,3]p6 2025-01-21T09:04:29.912825+0000 2025-01-20T03:11:56.962281+0000 1 
periodic scrub scheduled @ 2025-01-22T14:09:59.472013+0000
6.2 13 52 52 0 5423104 0 0 107 0 
active+recovery_wait+undersized+degraded+remapped 6s 732'107 963:2273 
[10,11,4]p10 [0,4]p0 2025-01-21T06:46:11.657543+0000 
2025-01-17T14:16:41.932263+0000 0 periodic scrub scheduled @ 
2025-01-22T13:52:17.796513+0000
6.5 18 100 0 0 10027008 0 0 113 0 
active+recovery_wait+undersized+degraded+remapped 6s 732'113 963:726 [0,9,3]p0 
[0,3]p0 2025-01-21T05:24:25.006369+0000 2025-01-21T05:24:25.006369+0000 0 
periodic scrub scheduled @ 2025-01-22T12:29:24.576297+0000
6.9 16 55 0 0 5730304 0 0 104 0 
active+recovery_wait+undersized+degraded+remapped 6s 732'104 963:996 
[10,9,1]p10 [10,1]p10 2025-01-21T09:02:25.957504+0000 
2025-01-17T14:16:50.479422+0000 0 periodic scrub scheduled @ 
2025-01-22T19:49:15.092705+0000
6.b 18 60 0 0 1843200 0 0 114 0 
active+recovery_wait+undersized+degraded+remapped 6s 732'114 962:3860 [1,2,6]p1 
[1,2]p1 2025-01-21T06:57:22.832565+0000 2025-01-17T14:16:57.820141+0000 0 
periodic scrub scheduled @ 2025-01-22T09:03:26.636583+0000
6.f 17 60 0 0 5832704 0 0 117 0 
active+recovery_wait+undersized+degraded+remapped 6s 732'117 963:1437 [4,3,2]p4 
[4,3]p4 2025-01-21T07:59:55.049488+0000 2025-01-17T14:17:05.581667+0000 0 
periodic scrub scheduled @ 2025-01-22T14:09:31.906176+0000
6.11 20 59 0 0 7888896 0 0 123 0 
active+recovery_wait+undersized+degraded+remapped 7s 732'123 963:3329 
[11,0,8]p11 [11,8]p11 2025-01-21T01:32:23.458956+0000 
2025-01-17T14:17:09.774195+0000 0 periodic scrub scheduled @ 
2025-01-22T06:25:32.462414+0000
6.12 21 4 42 0 10334208 0 0 141 0 
active+recovering+undersized+degraded+remapped 1.11259s 732'141 964:613 
[9,8,11]p9 [2,11]p2 2025-01-21T05:01:52.629884+0000 
2025-01-21T05:01:52.629884+0000 0 periodic scrub scheduled @ 
2025-01-22T07:57:03.899309+0000
6.13 22 138 0 0 8093696 0 0 156 0 
active+recovery_wait+undersized+degraded+remapped 6s 732'156 963:1312 [3,6,9]p3 
[3,6]p3 2025-01-21T04:47:23.091543+0000 2025-01-18T19:41:37.702881+0000 0 
periodic scrub scheduled @ 2025-01-22T13:44:27.566620+0000
6.14 14 57 0 0 5525504 0 0 116 0 
active+recovery_wait+undersized+degraded+remapped 6s 732'116 963:804 
[11,9,8]p11 [11,9]p11 2025-01-21T04:58:22.800659+0000 
2025-01-18T18:51:30.797784+0000 0 periodic scrub scheduled @ 
2025-01-22T14:10:05.285157+0000
6.17 15 58 0 0 3284992 0 0 117 0 
active+recovery_wait+undersized+degraded+remapped 7s 732'117 963:1758 
[11,8,5]p11 [11,8]p11 2025-01-21T17:02:51.283098+0000 
2025-01-17T14:17:24.300985+0000 3 periodic scrub scheduled @ 
2025-01-23T02:23:35.432833+0000
6.1a 16 2 0 0 3387392 0 0 118 0 active+recovering+undersized+remapped 0.802018s 
732'118 964:1263 [6,7,1]p6 [1,6]p1 2025-01-21T09:04:47.728667+0000 
2025-01-18T21:05:47.129277+0000 1 periodic scrub scheduled @ 
2025-01-22T16:41:28.828878+0000
6.1d 18 118 0 0 11776000 0 0 111 0 
active+recovery_wait+undersized+degraded+remapped 6s 732'111 963:1239 
[3,5,10]p3 [3,5]p3 2025-01-21T06:57:54.358465+0000 
2025-01-17T14:17:34.523095+0000 0 periodic scrub scheduled @ 
2025-01-22T10:05:29.250471+0000

* NOTE: Omap statistics are gathered during deep scrub and may be inaccurate 
soon afterwards depending on utilization. See 
http://docs.ceph.com/en/latest/dev/placement-group/#omap-statistics for further 
details.
ubuntu@ksr-ceph-deploy:~$ sleep 5;
ubuntu@ksr-ceph-deploy:~$ ceph pg ls degraded;
PG OBJECTS DEGRADED MISPLACED UNFOUND BYTES OMAP_BYTES* OMAP_KEYS* LOG LOG_DUPS 
STATE SINCE VERSION REPORTED UP ACTING SCRUB_STAMP DEEP_SCRUB_STAMP 
LAST_SCRUB_DURATION SCRUB_SCHEDULING
1.0 2 2 0 0 459280 0 0 117 0 active+recovery_wait+undersized+degraded+remapped 
12s 732'117 965:996 [11,2,8]p11 [11,2]p11 2025-01-21T05:25:36.075052+0000 
2025-01-17T14:16:37.883834+0000 1 periodic scrub scheduled @ 
2025-01-22T15:03:39.660268+0000
6.2 13 52 52 0 5423104 0 0 107 0 
active+recovery_wait+undersized+degraded+remapped 12s 732'107 965:2275 
[10,11,4]p10 [0,4]p0 2025-01-21T06:46:11.657543+0000 
2025-01-17T14:16:41.932263+0000 0 periodic scrub scheduled @ 
2025-01-22T13:52:17.796513+0000
6.5 18 100 0 0 10027008 0 0 113 0 
active+recovery_wait+undersized+degraded+remapped 11s 732'113 965:728 [0,9,3]p0 
[0,3]p0 2025-01-21T05:24:25.006369+0000 2025-01-21T05:24:25.006369+0000 0 
periodic scrub scheduled @ 2025-01-22T12:29:24.576297+0000
6.7 19 2 0 0 9535488 0 0 100 0 active+recovering+degraded 2s 732'100 966:1371 
[11,10,1]p11 [11,10,1]p11 2025-01-21T11:56:25.453326+0000 
2025-01-17T14:16:46.382792+0000 0 periodic scrub scheduled @ 
2025-01-22T21:24:43.245561+0000
6.9 16 55 0 0 5730304 0 0 104 0 
active+recovery_wait+undersized+degraded+remapped 11s 732'104 965:998 
[10,9,1]p10 [10,1]p10 2025-01-21T09:02:25.957504+0000 
2025-01-17T14:16:50.479422+0000 0 periodic scrub scheduled @ 
2025-01-22T19:49:15.092705+0000
6.b 18 60 0 0 1843200 0 0 114 0 
active+recovery_wait+undersized+degraded+remapped 12s 732'114 965:3862 
[1,2,6]p1 [1,2]p1 2025-01-21T06:57:22.832565+0000 
2025-01-17T14:16:57.820141+0000 0 periodic scrub scheduled @ 
2025-01-22T09:03:26.636583+0000
6.f 17 60 0 0 5832704 0 0 117 0 
active+recovery_wait+undersized+degraded+remapped 12s 732'117 965:1439 
[4,3,2]p4 [4,3]p4 2025-01-21T07:59:55.049488+0000 
2025-01-17T14:17:05.581667+0000 0 periodic scrub scheduled @ 
2025-01-22T14:09:31.906176+0000
6.11 20 59 0 0 7888896 0 0 123 0 
active+recovery_wait+undersized+degraded+remapped 12s 732'123 965:3331 
[11,0,8]p11 [11,8]p11 2025-01-21T01:32:23.458956+0000 
2025-01-17T14:17:09.774195+0000 0 periodic scrub scheduled @ 
2025-01-22T06:25:32.462414+0000
6.13 22 138 0 0 8093696 0 0 156 0 
active+recovery_wait+undersized+degraded+remapped 11s 732'156 963:1312 
[3,6,9]p3 [3,6]p3 2025-01-21T04:47:23.091543+0000 
2025-01-18T19:41:37.702881+0000 0 periodic scrub scheduled @ 
2025-01-22T13:44:27.566620+0000
6.14 14 57 0 0 5525504 0 0 116 0 
active+recovery_wait+undersized+degraded+remapped 12s 732'116 965:806 
[11,9,8]p11 [11,9]p11 2025-01-21T04:58:22.800659+0000 
2025-01-18T18:51:30.797784+0000 0 periodic scrub scheduled @ 
2025-01-22T14:10:05.285157+0000
6.17 15 58 0 0 3284992 0 0 117 0 
active+recovery_wait+undersized+degraded+remapped 12s 732'117 965:1760 
[11,8,5]p11 [11,8]p11 2025-01-21T17:02:51.283098+0000 
2025-01-17T14:17:24.300985+0000 3 periodic scrub scheduled @ 
2025-01-23T02:23:35.432833+0000
6.18 10 41 0 0 5115904 0 0 95 0 active+recovery_wait+degraded 12s 732'95 
965:888 [4,10,5]p4 [4,10,5]p4 2025-01-21T07:22:43.040812+0000 
2025-01-17T14:17:26.898595+0000 0 periodic scrub scheduled @ 
2025-01-22T17:24:55.003668+0000
6.1c 9 86 0 0 5013504 0 0 88 0 active+recovery_wait+degraded 12s 732'88 963:24 
[3,0,8]p3 [3,0,8]p3 2025-01-21T09:08:33.536101+0000 
2025-01-17T14:17:37.489220+0000 1 periodic scrub scheduled @ 
2025-01-22T11:35:58.254737+0000
6.1d 18 118 0 0 11776000 0 0 111 0 
active+recovery_wait+undersized+degraded+remapped 11s 732'111 963:1239 
[3,5,10]p3 [3,5]p3 2025-01-21T06:57:54.358465+0000 
2025-01-17T14:17:34.523095+0000 0 periodic scrub scheduled @ 
2025-01-22T10:05:29.250471+0000

* NOTE: Omap statistics are gathered during deep scrub and may be inaccurate 
soon afterwards depending on utilization. See 
http://docs.ceph.com/en/latest/dev/placement-group/#omap-statistics for further 
details.


________________________________
From: Dan van der Ster <dan.vanders...@clyso.com>
Sent: Tuesday, January 21, 2025 16:51
To: Anthony D'Atri <a...@dreamsnake.net>; Kasper Rasmussen 
<kasper_steenga...@hotmail.com>
Cc: ceph-users <ceph-users@ceph.io>
Subject: Re: [ceph-users] Re: Changing crush map result in > 100% objects 
degraded

On Tue, Jan 21, 2025 at 7:12 AM Anthony D'Atri <a...@dreamsnake.net> wrote:
> > On Jan 21, 2025, at 7:59 AM, Kasper Rasmussen 
> > <kasper_steenga...@hotmail.com> wrote:
> >
> > 1 - Why do this result in such a high - objects degraded - percentage?
>
> I suspect that’s a function of the new topology having changed the mappings 
> of multiple OSDs for given PGs.  It’s subtle, but when you move hosts into 
> rack CRUSH buckets, that’s a different set of inputs into the CRUSH hash 
> function, so the mappings that come out are different, even though you 
> haven’t changed the rules and would think that hosts are hosts.

Also, in the process of moving the hosts one by one, each step creates
a new topology which can change the ordering of hosts, incrementally
putting things out of whack.

> > 2 - Why do PGs get undersized?
>
> That often means that CRUSH can’t find a complete set of placements.  In your 
> situation maybe those would resolve themselves when you unleash the recovery 
> hounds.

We started noticing this kind of issue around pacific, but haven't
fully tracked down what broke yet.
See https://tracker.ceph.com/issues/56046 for similar.

Undersized or degraded should only happen -- by design -- if objects
were modified while the PG did not have 3 OSDs up and acting.
Kaspar: I assume the cluster was idle during your tests?
Also -- can you reproduce it without norecover/nobackfill set ?

Could you simplify your reproducer down to:

> HEALTH_OK
> ceph osd crush move ksr-ceph-osd1 rack=rack1
> ceph pg ls undersized / degraded # get a pgid of a degraded PG
> ceph pg $pgid query

Cheers, dan


--
Dan van der Ster
CTO @ CLYSO
Try our Ceph Analyzer -- https://analyzer.clyso.com/
https://clyso.com | dan.vanders...@clyso.com
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to