Also, check your value of mon_max_pg_per_osd In some cases raising it can resolve blocked activation. I set it to 1000.
> On May 10, 2025, at 1:34 PM, Eugen Block <ebl...@nde.ag> wrote: > > Just so I understand, you restarted (almost) all your OSDs at the same time > without waiting for them to successfully start? Why? > > > Zitat von Senol COLAK <se...@kubedo.com>: > >> Here is the output of “ceps old df tree” >> >> root@cmt6770:~# ceph osd df tree >> ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META >> AVAIL %USE VAR PGS STATUS TYPE NAME >> -1 34.05125 - 0 B 0 B 0 B 0 B 0 B >> 0 B 0 0 - root default >> -3 5.38539 - 1.7 TiB 1.6 TiB 1.6 TiB 228 KiB 3.8 GiB >> 186 GiB 89.61 1.28 - host cmt5923 >> 3 ssd 1.74660 0.79999 1.7 TiB 1.6 TiB 1.6 TiB 228 KiB 3.8 GiB >> 186 GiB 89.61 1.28 270 up osd.3 >> 8 ssd 1.81940 0 0 B 0 B 0 B 0 B 0 B >> 0 B 0 0 0 down osd.8 >> 9 ssd 1.81940 0 0 B 0 B 0 B 0 B 0 B >> 0 B 0 0 5 down osd.9 >> -15 4.40289 - 0 B 0 B 0 B 0 B 0 B >> 0 B 0 0 - host cmt6461 >> 24 nvme 0.90970 0.79999 0 B 0 B 0 B 0 B 0 B >> 0 B 0 0 0 down osd.24 >> 2 ssd 1.74660 0 0 B 0 B 0 B 0 B 0 B >> 0 B 0 0 2 down osd.2 >> 17 ssd 1.74660 1.00000 0 B 0 B 0 B 0 B 0 B >> 0 B 0 0 0 down osd.17 >> -5 5.35616 - 1.7 TiB 369 GiB 367 GiB 30 KiB 2.0 GiB >> 1.4 TiB 20.61 0.29 - host cmt6770 >> 4 nvme 1.86299 0 0 B 0 B 0 B 0 B 0 B >> 0 B 0 0 0 down osd.4 >> 0 ssd 0.87329 0 0 B 0 B 0 B 0 B 0 B >> 0 B 0 0 1 down osd.0 >> 1 ssd 0.87329 0 0 B 0 B 0 B 0 B 0 B >> 0 B 0 0 0 down osd.1 >> 14 ssd 0.87329 1.00000 894 GiB 43 GiB 43 GiB 16 KiB 106 MiB >> 851 GiB 4.84 0.07 42 down osd.14 >> 15 ssd 0.87329 1.00000 894 GiB 325 GiB 323 GiB 14 KiB 1.9 GiB >> 569 GiB 36.37 0.52 116 up osd.15 >> -9 7.24838 - 1.9 TiB 1.6 TiB 1.6 TiB 342 KiB 4.3 GiB >> 313 GiB 0 0 - host cmt7773 >> 5 nvme 1.81940 0 0 B 0 B 0 B 0 B 0 B >> 0 B 0 0 0 down osd.5 >> 19 nvme 1.81940 0.95001 0 B 0 B 0 B 0 B 0 B >> 0 B 0 0 0 down osd.19 >> 7 ssd 1.74660 0 0 B 0 B 0 B 0 B 0 B >> 0 B 0 0 0 down osd.7 >> 29 ssd 1.86299 1.00000 1.9 TiB 1.6 TiB 1.6 TiB 342 KiB 4.3 GiB >> 313 GiB 83.59 1.19 48 down osd.29 >> -13 7.93245 - 6.2 TiB 4.2 TiB 4.2 TiB 763 KiB 14 GiB >> 1.9 TiB 68.70 0.98 - host dc2943 >> 22 nvme 0.90970 1.00000 932 GiB 689 GiB 688 GiB 181 KiB 1.8 GiB >> 242 GiB 74.01 1.05 160 up osd.22 >> 23 nvme 0.90970 1.00000 932 GiB 823 GiB 821 GiB 138 KiB 2.4 GiB >> 108 GiB 88.37 1.26 175 up osd.23 >> 6 ssd 1.74660 0 0 B 0 B 0 B 0 B 0 B >> 0 B 0 0 0 down osd.6 >> 10 ssd 0.87329 1.00000 894 GiB 258 GiB 257 GiB 1 KiB 1.5 GiB >> 636 GiB 28.85 0.41 100 up osd.10 >> 11 ssd 0.87329 1.00000 894 GiB 323 GiB 321 GiB 1 KiB 1.9 GiB >> 571 GiB 36.14 0.51 87 up osd.11 >> 12 ssd 0.87329 1.00000 894 GiB 786 GiB 784 GiB 123 KiB 2.1 GiB >> 108 GiB 87.89 1.25 131 up osd.12 >> 13 ssd 0.87329 1.00000 894 GiB 678 GiB 676 GiB 170 KiB 2.0 GiB >> 216 GiB 75.86 1.08 119 up osd.13 >> 16 ssd 0.87329 0.79999 894 GiB 794 GiB 791 GiB 149 KiB 2.1 GiB >> 101 GiB 88.73 1.26 100 up osd.16 >> -11 3.72598 - 1.9 TiB 1.7 TiB 1.7 TiB 460 KiB 3.3 GiB >> 191 GiB 89.98 1.28 - host dc3658 >> 20 nvme 1.86299 0.95001 0 B 0 B 0 B 0 B 0 B >> 0 B 0 0 29 down osd.20 >> 21 nvme 1.86299 0.90002 1.9 TiB 1.7 TiB 1.7 TiB 460 KiB 3.3 GiB >> 191 GiB 89.98 1.28 338 up osd.21 >> TOTAL 13 TiB 9.4 TiB 9.4 TiB 1.8 MiB 27 GiB >> 4.0 TiB 70.18 >> MIN/MAX VAR: 0/1.28 STDDEV: 43.15 >> >> >>>> On 10. May 2025, at 16:13, Senol Colak - Kubedo <se...@kubedo.com> wrote: >>> >>> They provided me the following: >>> >>> root@cmt6770:~# ceph osd tree >>> ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF >>> -1 34.05125 root default >>> -3 5.38539 host cmt5923 >>> 3 ssd 1.74660 osd.3 up 0.79999 1.00000 >>> 8 ssd 1.81940 osd.8 down 1.00000 1.00000 >>> 9 ssd 1.81940 osd.9 down 1.00000 1.00000 >>> -15 4.40289 host cmt6461 >>> 24 nvme 0.90970 osd.24 down 0.79999 1.00000 >>> 2 ssd 1.74660 osd.2 down 1.00000 1.00000 >>> 17 ssd 1.74660 osd.17 down 1.00000 1.00000 >>> -5 5.35616 host cmt6770 >>> 4 nvme 1.86299 osd.4 down 1.00000 1.00000 >>> 0 ssd 0.87329 osd.0 down 1.00000 1.00000 >>> 1 ssd 0.87329 osd.1 down 1.00000 1.00000 >>> 14 ssd 0.87329 osd.14 down 1.00000 1.00000 >>> 15 ssd 0.87329 osd.15 up 1.00000 1.00000 >>> -9 7.24838 host cmt7773 >>> 5 nvme 1.81940 osd.5 down 1.00000 1.00000 >>> 19 nvme 1.81940 osd.19 down 0.95001 1.00000 >>> 7 ssd 1.74660 osd.7 down 1.00000 1.00000 >>> 29 ssd 1.86299 osd.29 up 1.00000 1.00000 >>> -13 7.93245 host dc2943 >>> 22 nvme 0.90970 osd.22 up 1.00000 1.00000 >>> 23 nvme 0.90970 osd.23 up 1.00000 1.00000 >>> 6 ssd 1.74660 osd.6 down 1.00000 1.00000 >>> 10 ssd 0.87329 osd.10 up 1.00000 1.00000 >>> 11 ssd 0.87329 osd.11 up 1.00000 1.00000 >>> 12 ssd 0.87329 osd.12 up 1.00000 1.00000 >>> 13 ssd 0.87329 osd.13 up 1.00000 1.00000 >>> 16 ssd 0.87329 osd.16 up 0.79999 1.00000 >>> -11 3.72598 host dc3658 >>> 20 nvme 1.86299 osd.20 down 0.95001 1.00000 >>> 21 nvme 1.86299 osd.21 up 0.90002 1.00000 >>> root@cmt6770:~# ^C >>> root@cmt6770:~# >>> From: Michel Raabe <ra...@b1-systems.de> >>> Sent: Saturday, May 10, 2025 3:57 PM >>> To: Senol COLAK <se...@kubedo.com> >>> Cc: ceph-users@ceph.io <ceph-users@ceph.io> >>> Subject: Re: [ceph-users] We lost the stability of the cluster, 18.2.2 -> >>> 18.2.6 -> 19.2.1 Chain of upgrade failure >>> >>> Hi, >>> >>> Can you provide a „ceph osd df tree“ output? >>> >>> Regards >>> Michel >>> >>> Sent from my mobile phone >>> >>> > On 10. May 2025, at 14:56, Senol COLAK <se...@kubedo.com> wrote: >>> > >>> > Hello, >>> > >>> > After upgrading from ceph reef 18.2.6 to ceph squid 19.2.1 I restarted >>> > the osds and they remained down. The events contain the following records: >>> > >>> > root@cmt6770:~# ceph health detail >>> > HEALTH_WARN 1 filesystem is degraded; 1 MDSs report slow metadata IOs; >>> > mon cmt6770 is low on available space; 9 osds down; 1 host (3 osds) down; >>> > 5 nearfull osd(s); Reduced data availability: 866 pgs inactive, 489 pgs >>> > down, 5 pgs incomplete, 60 pgs stale; Low space hindering backfill (add >>> > storage if this doesn't resolve itself): 9 pgs backfill_toofull; Degraded >>> > data redundancy: 432856/3408880 objects degraded (12.698%), 191 pgs >>> > degraded, 181 pgs undersized; 12 pool(s) nearfull; 255 slow ops, oldest >>> > one blocked for 2417 sec, daemons [osd.10,osd.12,osd.21,osd.22,osd.23] >>> > have slow ops. >>> > [WRN] FS_DEGRADED: 1 filesystem is degraded >>> > fs cephfs is degraded >>> > [WRN] MDS_SLOW_METADATA_IO: 1 MDSs report slow metadata IOs >>> > mds.cmt5923(mds.0): 1 slow metadata IOs are blocked > 30 secs, oldest >>> > blocked for 38201 secs >>> > [WRN] MON_DISK_LOW: mon cmt6770 is low on available space >>> > mon.cmt6770 has 28% avail >>> > [WRN] OSD_DOWN: 9 osds down >>> > osd.0 (root=default,host=cmt6770) is down >>> > osd.1 (root=default,host=cmt6770) is down >>> > osd.2 (root=default,host=cmt6461) is down >>> > osd.7 (root=default,host=cmt7773) is down >>> > osd.8 (root=default,host=cmt5923) is down >>> > osd.9 (root=default,host=cmt5923) is down >>> > osd.14 (root=default,host=cmt6770) is down >>> > osd.17 (root=default,host=cmt6461) is down >>> > osd.24 (root=default,host=cmt6461) is down >>> > [WRN] OSD_HOST_DOWN: 1 host (3 osds) down >>> > host cmt6461 (root=default) (3 osds) is down >>> > [WRN] OSD_NEARFULL: 5 nearfull osd(s) >>> > osd.3 is near full >>> > osd.12 is near full >>> > osd.16 is near full >>> > osd.21 is near full >>> > osd.23 is near full >>> > [WRN] PG_AVAILABILITY: Reduced data availability: 866 pgs inactive, 489 >>> > pgs down, 5 pgs incomplete, 60 pgs stale >>> > pg 7.1c5 is down, acting [3] >>> > pg 7.1c7 is stuck inactive for 3h, current state unknown, last acting [] >>> > pg 7.1c8 is stuck inactive for 3h, current state unknown, last acting [] >>> > pg 7.1cb is stuck inactive for 3h, current state unknown, last acting [] >>> > pg 7.1cd is down, acting [15,3] >>> > pg 7.1ce is stuck inactive for 3h, current state unknown, last acting [] >>> > pg 7.1cf is stuck inactive for 3h, current state unknown, last acting [] >>> > pg 7.1d0 is stuck inactive for 3h, current state unknown, last acting [] >>> > pg 7.1d1 is down, acting [29,13] >>> > pg 7.1d2 is stuck inactive for 3h, current state unknown, last acting [] >>> > pg 7.1d3 is down, acting [23] >>> > pg 7.1d4 is down, acting [16] >>> > pg 7.1d5 is stuck inactive for 3h, current state unknown, last acting [] >>> > pg 7.1d6 is down, acting [3] >>> > pg 7.1d9 is stuck inactive for 3h, current state unknown, last acting [] >>> > pg 7.1da is stuck inactive for 3h, current state unknown, last acting [] >>> > pg 7.1e0 is stuck inactive for 3h, current state unknown, last acting [] >>> > pg 7.1e1 is stuck inactive for 3h, current state unknown, last acting [] >>> > pg 7.1e2 is stuck inactive for 3h, current state unknown, last acting [] >>> > pg 7.1e4 is stuck inactive for 3h, current state unknown, last acting [] >>> > pg 7.1e5 is down, acting [12,29] >>> > pg 7.1e7 is stuck stale for 7m, current state stale+down, last acting [9] >>> > pg 7.1e8 is down, acting [12] >>> > pg 7.1e9 is stuck stale for 31m, current state stale, last acting [2] >>> > pg 7.1eb is stuck inactive for 3h, current state unknown, last acting [] >>> > pg 7.1ed is down, acting [3] >>> > pg 7.1ee is stuck inactive for 3h, current state unknown, last acting [] >>> > pg 7.1ef is down, acting [12] >>> > pg 7.1f0 is down, acting [10] >>> > pg 7.1f1 is down, acting [12,29] >>> > pg 7.1f2 is down, acting [16] >>> > pg 7.1f3 is stuck stale for 7m, current state stale, last acting [9] >>> > pg 7.1f4 is down, acting [22] >>> > pg 7.1f5 is down, acting [22] >>> > pg 7.1f8 is down, acting [29] >>> > pg 7.1f9 is stuck inactive for 3h, current state unknown, last acting [] >>> > pg 7.1fb is stuck inactive for 3h, current state unknown, last acting [] >>> > pg 7.1fc is stuck stale for 15m, current state stale+down, last acting >>> > [29] >>> > pg 7.1fd is down, acting [3] >>> > pg 7.1fe is down, acting [3,15] >>> > pg 7.1ff is down, acting [3] >>> > pg 7.201 is down, acting [12] >>> > pg 7.204 is down, acting [10] >>> > pg 7.205 is down, acting [13] >>> > pg 7.207 is down, acting [11] >>> > pg 7.20a is down, acting [3] >>> > pg 7.20b is down, acting [22] >>> > pg 7.20d is stuck inactive for 3h, current state unknown, last acting [] >>> > pg 7.210 is stuck inactive for 3h, current state unknown, last acting [] >>> > pg 7.211 is stuck inactive for 3h, current state unknown, last acting [] >>> > pg 7.21b is down, acting [16] >>> > [WRN] PG_BACKFILL_FULL: Low space hindering backfill (add storage if this >>> > doesn't resolve itself): 9 pgs backfill_toofull >>> > pg 7.20 is active+undersized+degraded+remapped+backfill_toofull, acting >>> > [3] >>> > pg 19.2b is active+remapped+backfill_toofull, acting [12,3] >>> > pg 19.6b is active+remapped+backfill_toofull, acting [12,3] >>> > pg 20.55 is active+remapped+backfill_toofull, acting [29,3] >>> > pg 24.6 is active+undersized+degraded+remapped+backfill_toofull, acting >>> > [22] >>> > pg 24.b is active+undersized+degraded+remapped+backfill_toofull, acting >>> > [21] >>> > pg 24.13 is active+undersized+degraded+remapped+backfill_toofull, acting >>> > [23] >>> > pg 24.16 is active+undersized+degraded+remapped+backfill_toofull, acting >>> > [21] >>> > pg 24.1d is active+undersized+degraded+remapped+backfill_toofull, acting >>> > [21] >>> > [WRN] PG_DEGRADED: Degraded data redundancy: 432856/3408880 objects >>> > degraded (12.698%), 191 pgs degraded, 181 pgs undersized >>> > pg 7.3f is stuck undersized for 9m, current state >>> > active+undersized+degraded, last acting [16] >>> > pg 7.56 is stuck undersized for 21m, current state >>> > active+undersized+degraded, last acting [13] >>> > pg 7.61 is stuck undersized for 8m, current state >>> > active+undersized+degraded, last acting [29] >>> > pg 7.66 is stuck undersized for 7m, current state >>> > active+undersized+degraded, last acting [29] >>> > pg 7.6b is stuck undersized for 25m, current state >>> > active+undersized+degraded, last acting [21] >>> > pg 7.102 is active+undersized+degraded, acting [23] >>> > pg 7.118 is stuck undersized for 14m, current state >>> > active+undersized+degraded, last acting [23] >>> > pg 7.11c is stuck undersized for 9m, current state >>> > active+undersized+degraded, last acting [3] >>> > pg 7.133 is stuck undersized for 7m, current state >>> > active+undersized+degraded, last acting [12] >>> > pg 7.139 is stuck undersized for 7m, current state >>> > active+undersized+degraded, last acting [13] >>> > pg 7.143 is stuck undersized for 8m, current state >>> > active+undersized+degraded, last acting [29] >>> > pg 7.155 is stuck undersized for 25m, current state >>> > active+undersized+degraded, last acting [3] >>> > pg 7.156 is active+undersized+degraded, acting [12] >>> > pg 7.15e is stuck undersized for 31m, current state >>> > active+undersized+degraded, last acting [3] >>> > pg 7.15f is stuck undersized for 8m, current state >>> > active+undersized+degraded, last acting [29] >>> > pg 7.168 is stuck undersized for 67m, current state >>> > active+undersized+degraded, last acting [22] >>> > pg 7.17f is stuck undersized for 8m, current state >>> > active+undersized+degraded, last acting [29] >>> > pg 7.180 is stuck undersized for 14m, current state >>> > active+undersized+degraded, last acting [21] >>> > pg 7.18e is stuck undersized for 8m, current state >>> > active+undersized+degraded, last acting [29] >>> > pg 7.193 is active+undersized+degraded, acting [16] >>> > pg 7.197 is stuck undersized for 14m, current state >>> > active+undersized+degraded, last acting [21] >>> > pg 7.1a6 is stuck undersized for 8m, current state >>> > active+undersized+degraded, last acting [29] >>> > pg 7.1b7 is stuck undersized for 8m, current state >>> > active+undersized+degraded, last acting [29] >>> > pg 7.1c6 is stuck undersized for 14m, current state >>> > active+undersized+degraded, last acting [22] >>> > pg 7.1ca is stuck undersized for 14m, current state >>> > active+undersized+degraded, last acting [22] >>> > pg 7.1d7 is stuck undersized for 9m, current state >>> > active+undersized+degraded, last acting [22] >>> > pg 7.1df is active+undersized+degraded, acting [21] >>> > pg 7.1e6 is stuck undersized for 10h, current state >>> > active+undersized+degraded, last acting [23] >>> > pg 7.200 is active+undersized+degraded, acting [29] >>> > pg 7.202 is stuck undersized for 7m, current state >>> > active+undersized+degraded, last acting [13] >>> > pg 7.20c is stuck undersized for 10h, current state >>> > active+undersized+degraded, last acting [16] >>> > pg 7.20e is stuck undersized for 47m, current state >>> > active+undersized+degraded, last acting [23] >>> > pg 7.20f is stuck undersized for 47m, current state >>> > active+undersized+degraded, last acting [23] >>> > pg 7.217 is stuck undersized for 7m, current state >>> > active+undersized+degraded, last acting [21] >>> > pg 15.35 is active+undersized+degraded, acting [22] >>> > pg 16.2a is stuck undersized for 10h, current state >>> > active+undersized+degraded, last acting [21] >>> > pg 19.43 is stuck undersized for 31m, current state >>> > active+undersized+degraded, last acting [23] >>> > pg 19.44 is stuck undersized for 8m, current state >>> > active+undersized+degraded, last acting [29] >>> > pg 19.4e is stuck undersized for 14m, current state >>> > active+undersized+degraded, last acting [16] >>> > pg 19.52 is active+undersized+degraded+wait, acting [3] >>> > pg 19.55 is stuck undersized for 25m, current state >>> > active+undersized+degraded, last acting [23] >>> > pg 19.61 is stuck undersized for 25m, current state >>> > active+undersized+degraded, last acting [21] >>> > pg 19.72 is stuck undersized for 31m, current state >>> > active+undersized+degraded, last acting [3] >>> > pg 20.42 is stuck undersized for 7m, current state >>> > active+undersized+degraded, last acting [23] >>> > pg 20.48 is stuck undersized for 67m, current state >>> > active+undersized+degraded, last acting [16] >>> > pg 20.5b is stuck undersized for 21m, current state >>> > active+undersized+degraded, last acting [12] >>> > pg 20.5f is stuck undersized for 10h, current state >>> > active+undersized+degraded, last acting [12] >>> > pg 20.65 is stuck undersized for 10m, current state >>> > active+undersized+degraded, last acting [23] >>> > pg 20.6a is active+undersized+degraded, acting [13] >>> > pg 20.71 is stuck undersized for 31m, current state >>> > active+undersized+degraded, last acting [13] >>> > pg 20.7d is stuck undersized for 7m, current state >>> > active+undersized+degraded, last acting [29] >>> > [WRN] POOL_NEARFULL: 12 pool(s) nearfull >>> > pool '.mgr' is nearfull >>> > pool 'DataStore' is nearfull >>> > pool 'cephfs_data' is nearfull >>> > pool 'cephfs_metadata' is nearfull >>> > pool 'OS' is nearfull >>> > pool 'cloud' is nearfull >>> > pool 'DataStore_2' is nearfull >>> > pool 'DataStore_3' is nearfull >>> > pool 'MGMT' is nearfull >>> > pool 'DataStore_4' is nearfull >>> > pool 'DataStore_5' is nearfull >>> > pool 'fast' is nearfull >>> > [WRN] SLOW_OPS: 255 slow ops, oldest one blocked for 2417 sec, daemons >>> > [osd.10,osd.12,osd.21,osd.22,osd.23] have slow ops. >>> > root@cmt6770:~# ceph -s >>> > cluster: >>> > id: 9319dafb-3408-46cb-9b09-b3d381114545 >>> > health: HEALTH_WARN >>> > 1 filesystem is degraded >>> > 1 MDSs report slow metadata IOs >>> > mon cmt6770 is low on available space >>> > 9 osds down >>> > 1 host (3 osds) down >>> > 5 nearfull osd(s) >>> > Reduced data availability: 866 pgs inactive, 489 pgs down, 5 pgs >>> > incomplete, 60 pgs stale >>> > Low space hindering backfill (add storage if this doesn't resolve >>> > itself): 9 pgs backfill_toofull >>> > Degraded data redundancy: 432856/3408880 objects degraded (12.698%), 191 >>> > pgs degraded, 181 pgs undersized >>> > 12 pool(s) nearfull >>> > 255 slow ops, oldest one blocked for 2422 sec, daemons >>> > [osd.10,osd.12,osd.21,osd.22,osd.23] have slow ops. >>> > >>> > services: >>> > mon: 2 daemons, quorum cmt6770,cmt5923 (age 70m) >>> > mgr: cmt6770(active, since 3h) >>> > mds: 1/1 daemons up, 1 standby >>> > osd: 25 osds: 11 up (since 14s), 20 in (since 9m); 182 remapped pgs >>> > >>> > data: >>> > volumes: 0/1 healthy, 1 recovering >>> > pools: 12 pools, 1589 pgs >>> > objects: 1.70M objects, 6.2 TiB >>> > usage: 9.1 TiB used, 6.2 TiB / 15 TiB avail >>> > pgs: 28.760% pgs unknown >>> > 34.991% pgs not active >>> > 432856/3408880 objects degraded (12.698%) >>> > 388136/3408880 objects misplaced (11.386%) >>> > 466 down >>> > 457 unknown >>> > 209 active+clean >>> > 185 active+undersized+degraded >>> > 157 active+clean+remapped >>> > 62 stale >>> > 20 stale+down >>> > 9 active+undersized+remapped >>> > 6 active+undersized+degraded+remapped+backfill_toofull >>> > 5 incomplete >>> > 3 active+remapped+backfill_toofull >>> > 2 active+clean+scrubbing+deep >>> > 2 active+clean+remapped+scrubbing+deep >>> > 2 down+remapped >>> > 1 stale+creating+down >>> > 1 active+remapped+backfilling >>> > 1 active+remapped+backfill_wait >>> > 1 active+undersized+remapped+wait >>> > >>> > io: >>> > recovery: 12 MiB/s, 3 objects/s >>> > >>> > root@cmt6770:~# ceph osd tree >>> > ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF >>> > -1 34.05125 root default >>> > -3 5.38539 host cmt5923 >>> > 3 ssd 1.74660 osd.3 up 0.79999 1.00000 >>> > 8 ssd 1.81940 osd.8 down 1.00000 1.00000 >>> > 9 ssd 1.81940 osd.9 down 1.00000 1.00000 >>> > -15 4.40289 host cmt6461 >>> > 24 nvme 0.90970 osd.24 down 0.79999 1.00000 >>> > 2 ssd 1.74660 osd.2 down 1.00000 1.00000 >>> > 17 ssd 1.74660 osd.17 down 1.00000 1.00000 >>> > -5 5.35616 host cmt6770 >>> > 0 ssd 0.87329 osd.0 down 1.00000 1.00000 >>> > 1 ssd 0.87329 osd.1 down 1.00000 1.00000 >>> > 4 ssd 1.86299 osd.4 down 0 1.00000 >>> > 14 ssd 0.87329 osd.14 down 1.00000 1.00000 >>> > 15 ssd 0.87329 osd.15 up 1.00000 1.00000 >>> > -9 7.24838 host cmt7773 >>> > 5 nvme 1.81940 osd.5 down 0 1.00000 >>> > 19 nvme 1.81940 osd.19 down 0 1.00000 >>> > 7 ssd 1.74660 osd.7 down 1.00000 1.00000 >>> > 29 ssd 1.86299 osd.29 up 1.00000 1.00000 >>> > -13 7.93245 host dc2943 >>> > 22 nvme 0.90970 osd.22 up 1.00000 1.00000 >>> > 23 nvme 0.90970 osd.23 up 1.00000 1.00000 >>> > 6 ssd 1.74660 osd.6 down 0 1.00000 >>> > 10 ssd 0.87329 osd.10 up 1.00000 1.00000 >>> > 11 ssd 0.87329 osd.11 up 1.00000 1.00000 >>> > 12 ssd 0.87329 osd.12 up 1.00000 1.00000 >>> > 13 ssd 0.87329 osd.13 up 1.00000 1.00000 >>> > 16 ssd 0.87329 osd.16 up 0.79999 1.00000 >>> > -11 3.72598 host dc3658 >>> > 20 nvme 1.86299 osd.20 down 0 1.00000 >>> > 21 nvme 1.86299 osd.21 up 0.90002 1.00000 >>> > root@cmt6770:~# ceph osd df >>> > ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS >>> > STATUS >>> > 3 ssd 1.74660 0.79999 1.7 TiB 1.5 TiB 1.5 TiB 228 KiB 2.9 GiB 251 GiB >>> > 85.96 1.44 214 up >>> > 8 ssd 1.81940 1.00000 0 B 0 B 0 B 0 B 0 B 0 B 0 0 0 down >>> > 9 ssd 1.81940 1.00000 1.8 TiB 776 MiB 745 MiB 8 KiB 31 MiB 1.8 TiB 0.04 0 >>> > 46 down >>> > 24 nvme 0.90970 0.79999 0 B 0 B 0 B 0 B 0 B 0 B 0 0 0 down >>> > 2 ssd 1.74660 1.00000 0 B 0 B 0 B 0 B 0 B 0 B 0 0 4 down >>> > 17 ssd 1.74660 1.00000 0 B 0 B 0 B 0 B 0 B 0 B 0 0 0 down >>> > 0 ssd 0.87329 1.00000 0 B 0 B 0 B 0 B 0 B 0 B 0 0 3 down >>> > 1 ssd 0.87329 1.00000 0 B 0 B 0 B 0 B 0 B 0 B 0 0 0 down >>> > 4 ssd 1.86299 0 0 B 0 B 0 B 0 B 0 B 0 B 0 0 0 down >>> > 14 ssd 0.87329 1.00000 894 GiB 792 MiB 752 MiB 14 KiB 40 MiB 893 GiB 0.09 >>> > 0.00 25 down >>> > 15 ssd 0.87329 1.00000 894 GiB 232 GiB 231 GiB 14 KiB 1.4 GiB 662 GiB >>> > 25.98 0.44 83 up >>> > 5 nvme 1.81940 0 0 B 0 B 0 B 0 B 0 B 0 B 0 0 0 down >>> > 19 nvme 1.81940 0 0 B 0 B 0 B 0 B 0 B 0 B 0 0 0 down >>> > 7 ssd 1.74660 1.00000 0 B 0 B 0 B 0 B 0 B 0 B 0 0 0 down >>> > 29 ssd 1.86299 1.00000 1.9 TiB 1.5 TiB 1.5 TiB 323 KiB 2.8 GiB 354 GiB >>> > 81.44 1.37 222 up >>> > 22 nvme 0.90970 1.00000 932 GiB 689 GiB 687 GiB 181 KiB 1.6 GiB 243 GiB >>> > 73.96 1.24 139 up >>> > 23 nvme 0.90970 1.00000 932 GiB 820 GiB 818 GiB 138 KiB 2.0 GiB 112 GiB >>> > 87.98 1.48 144 up >>> > 6 ssd 1.74660 0 0 B 0 B 0 B 0 B 0 B 0 B 0 0 0 down >>> > 10 ssd 0.87329 1.00000 894 GiB 237 GiB 235 GiB 1 KiB 1.2 GiB 658 GiB >>> > 26.46 0.44 82 up >>> > 11 ssd 0.87329 1.00000 894 GiB 264 GiB 263 GiB 1 KiB 1.4 GiB 630 GiB >>> > 29.54 0.50 67 up >>> > 12 ssd 0.87329 1.00000 894 GiB 780 GiB 778 GiB 123 KiB 1.8 GiB 114 GiB >>> > 87.26 1.46 113 up >>> > 13 ssd 0.87329 1.00000 894 GiB 684 GiB 682 GiB 170 KiB 1.9 GiB 210 GiB >>> > 76.48 1.28 98 up >>> > 16 ssd 0.87329 0.79999 894 GiB 779 GiB 777 GiB 149 KiB 1.8 GiB 116 GiB >>> > 87.06 1.46 86 up >>> > 20 nvme 1.86299 0 0 B 0 B 0 B 0 B 0 B 0 B 0 0 0 down >>> > 21 nvme 1.86299 0.90002 1.9 TiB 1.7 TiB 1.7 TiB 430 KiB 3.5 GiB 194 GiB >>> > 89.84 1.51 314 up >>> > TOTAL 15 TiB 9.1 TiB 9.1 TiB 1.7 MiB 22 GiB 6.2 TiB 59.60 >>> > MIN/MAX VAR: 0/1.51 STDDEV: 44.89 >>> > >>> > >>> > also osd start logs: >>> > May 10 15:38:32 cmt5923 systemd[1]: ceph-osd@9.service: Failed with >>> > result 'signal'. >>> > May 10 15:38:37 cmt5923 ceph-osd[2383902]: 2025-05-10T15:38:37.579+0300 >>> > 764caf13f880 -1 osd.8 100504 log_to_monitors true >>> > May 10 15:38:37 cmt5923 ceph-osd[2383902]: 2025-05-10T15:38:37.791+0300 >>> > 764c8c64b6c0 -1 log_channel(cluster) log [ERR] : 7.26a past_intervals >>> > [96946,100253) start interval does not contain the required bound >>> > [93903,100253) start >>> > May 10 15:38:37 cmt5923 ceph-osd[2383902]: 2025-05-10T15:38:37.791+0300 >>> > 764c8c64b6c0 -1 osd.8 pg_epoch: 100377 pg[7.26a( empty local-lis/les=0/0 >>> > n=0 ec=96946/96946 lis/c=96236/93898 les/c/f=96237/93903/91308 >>> > sis=100253) [3,1] r=-1 lpr=100376 pi=[96946,100253)/3 crt=0'0 mlcod 0'0 >>> > unknown mbc={}] PeeringState::check_past_interval_bounds 7.26a >>> > past_intervals [96946,100253) start interval does not contain the >>> > required bound [93903,100253) start >>> > >>> > We appreciate any support and guidance, >>> > Thanks in advance >>> > _______________________________________________ >>> > ceph-users mailing list -- ceph-users@ceph.io >>> > To unsubscribe send an email to ceph-users-le...@ceph.io >>> >>> >>> _______________________________________________ >>> ceph-users mailing list -- ceph-users@ceph.io >>> To unsubscribe send an email to ceph-users-le...@ceph.io >> >> _______________________________________________ >> ceph-users mailing list -- ceph-users@ceph.io >> To unsubscribe send an email to ceph-users-le...@ceph.io > > > _______________________________________________ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io