Hi, Can you provide a „ceph osd df tree“ output?
Regards Michel Sent from my mobile phone > On 10. May 2025, at 14:56, Senol COLAK <se...@kubedo.com> wrote: > > Hello, > > After upgrading from ceph reef 18.2.6 to ceph squid 19.2.1 I restarted the > osds and they remained down. The events contain the following records: > > root@cmt6770:~# ceph health detail > HEALTH_WARN 1 filesystem is degraded; 1 MDSs report slow metadata IOs; mon > cmt6770 is low on available space; 9 osds down; 1 host (3 osds) down; 5 > nearfull osd(s); Reduced data availability: 866 pgs inactive, 489 pgs down, 5 > pgs incomplete, 60 pgs stale; Low space hindering backfill (add storage if > this doesn't resolve itself): 9 pgs backfill_toofull; Degraded data > redundancy: 432856/3408880 objects degraded (12.698%), 191 pgs degraded, 181 > pgs undersized; 12 pool(s) nearfull; 255 slow ops, oldest one blocked for > 2417 sec, daemons [osd.10,osd.12,osd.21,osd.22,osd.23] have slow ops. > [WRN] FS_DEGRADED: 1 filesystem is degraded > fs cephfs is degraded > [WRN] MDS_SLOW_METADATA_IO: 1 MDSs report slow metadata IOs > mds.cmt5923(mds.0): 1 slow metadata IOs are blocked > 30 secs, oldest > blocked for 38201 secs > [WRN] MON_DISK_LOW: mon cmt6770 is low on available space > mon.cmt6770 has 28% avail > [WRN] OSD_DOWN: 9 osds down > osd.0 (root=default,host=cmt6770) is down > osd.1 (root=default,host=cmt6770) is down > osd.2 (root=default,host=cmt6461) is down > osd.7 (root=default,host=cmt7773) is down > osd.8 (root=default,host=cmt5923) is down > osd.9 (root=default,host=cmt5923) is down > osd.14 (root=default,host=cmt6770) is down > osd.17 (root=default,host=cmt6461) is down > osd.24 (root=default,host=cmt6461) is down > [WRN] OSD_HOST_DOWN: 1 host (3 osds) down > host cmt6461 (root=default) (3 osds) is down > [WRN] OSD_NEARFULL: 5 nearfull osd(s) > osd.3 is near full > osd.12 is near full > osd.16 is near full > osd.21 is near full > osd.23 is near full > [WRN] PG_AVAILABILITY: Reduced data availability: 866 pgs inactive, 489 pgs > down, 5 pgs incomplete, 60 pgs stale > pg 7.1c5 is down, acting [3] > pg 7.1c7 is stuck inactive for 3h, current state unknown, last acting [] > pg 7.1c8 is stuck inactive for 3h, current state unknown, last acting [] > pg 7.1cb is stuck inactive for 3h, current state unknown, last acting [] > pg 7.1cd is down, acting [15,3] > pg 7.1ce is stuck inactive for 3h, current state unknown, last acting [] > pg 7.1cf is stuck inactive for 3h, current state unknown, last acting [] > pg 7.1d0 is stuck inactive for 3h, current state unknown, last acting [] > pg 7.1d1 is down, acting [29,13] > pg 7.1d2 is stuck inactive for 3h, current state unknown, last acting [] > pg 7.1d3 is down, acting [23] > pg 7.1d4 is down, acting [16] > pg 7.1d5 is stuck inactive for 3h, current state unknown, last acting [] > pg 7.1d6 is down, acting [3] > pg 7.1d9 is stuck inactive for 3h, current state unknown, last acting [] > pg 7.1da is stuck inactive for 3h, current state unknown, last acting [] > pg 7.1e0 is stuck inactive for 3h, current state unknown, last acting [] > pg 7.1e1 is stuck inactive for 3h, current state unknown, last acting [] > pg 7.1e2 is stuck inactive for 3h, current state unknown, last acting [] > pg 7.1e4 is stuck inactive for 3h, current state unknown, last acting [] > pg 7.1e5 is down, acting [12,29] > pg 7.1e7 is stuck stale for 7m, current state stale+down, last acting [9] > pg 7.1e8 is down, acting [12] > pg 7.1e9 is stuck stale for 31m, current state stale, last acting [2] > pg 7.1eb is stuck inactive for 3h, current state unknown, last acting [] > pg 7.1ed is down, acting [3] > pg 7.1ee is stuck inactive for 3h, current state unknown, last acting [] > pg 7.1ef is down, acting [12] > pg 7.1f0 is down, acting [10] > pg 7.1f1 is down, acting [12,29] > pg 7.1f2 is down, acting [16] > pg 7.1f3 is stuck stale for 7m, current state stale, last acting [9] > pg 7.1f4 is down, acting [22] > pg 7.1f5 is down, acting [22] > pg 7.1f8 is down, acting [29] > pg 7.1f9 is stuck inactive for 3h, current state unknown, last acting [] > pg 7.1fb is stuck inactive for 3h, current state unknown, last acting [] > pg 7.1fc is stuck stale for 15m, current state stale+down, last acting [29] > pg 7.1fd is down, acting [3] > pg 7.1fe is down, acting [3,15] > pg 7.1ff is down, acting [3] > pg 7.201 is down, acting [12] > pg 7.204 is down, acting [10] > pg 7.205 is down, acting [13] > pg 7.207 is down, acting [11] > pg 7.20a is down, acting [3] > pg 7.20b is down, acting [22] > pg 7.20d is stuck inactive for 3h, current state unknown, last acting [] > pg 7.210 is stuck inactive for 3h, current state unknown, last acting [] > pg 7.211 is stuck inactive for 3h, current state unknown, last acting [] > pg 7.21b is down, acting [16] > [WRN] PG_BACKFILL_FULL: Low space hindering backfill (add storage if this > doesn't resolve itself): 9 pgs backfill_toofull > pg 7.20 is active+undersized+degraded+remapped+backfill_toofull, acting [3] > pg 19.2b is active+remapped+backfill_toofull, acting [12,3] > pg 19.6b is active+remapped+backfill_toofull, acting [12,3] > pg 20.55 is active+remapped+backfill_toofull, acting [29,3] > pg 24.6 is active+undersized+degraded+remapped+backfill_toofull, acting > [22] > pg 24.b is active+undersized+degraded+remapped+backfill_toofull, acting > [21] > pg 24.13 is active+undersized+degraded+remapped+backfill_toofull, acting > [23] > pg 24.16 is active+undersized+degraded+remapped+backfill_toofull, acting > [21] > pg 24.1d is active+undersized+degraded+remapped+backfill_toofull, acting > [21] > [WRN] PG_DEGRADED: Degraded data redundancy: 432856/3408880 objects degraded > (12.698%), 191 pgs degraded, 181 pgs undersized > pg 7.3f is stuck undersized for 9m, current state > active+undersized+degraded, last acting [16] > pg 7.56 is stuck undersized for 21m, current state > active+undersized+degraded, last acting [13] > pg 7.61 is stuck undersized for 8m, current state > active+undersized+degraded, last acting [29] > pg 7.66 is stuck undersized for 7m, current state > active+undersized+degraded, last acting [29] > pg 7.6b is stuck undersized for 25m, current state > active+undersized+degraded, last acting [21] > pg 7.102 is active+undersized+degraded, acting [23] > pg 7.118 is stuck undersized for 14m, current state > active+undersized+degraded, last acting [23] > pg 7.11c is stuck undersized for 9m, current state > active+undersized+degraded, last acting [3] > pg 7.133 is stuck undersized for 7m, current state > active+undersized+degraded, last acting [12] > pg 7.139 is stuck undersized for 7m, current state > active+undersized+degraded, last acting [13] > pg 7.143 is stuck undersized for 8m, current state > active+undersized+degraded, last acting [29] > pg 7.155 is stuck undersized for 25m, current state > active+undersized+degraded, last acting [3] > pg 7.156 is active+undersized+degraded, acting [12] > pg 7.15e is stuck undersized for 31m, current state > active+undersized+degraded, last acting [3] > pg 7.15f is stuck undersized for 8m, current state > active+undersized+degraded, last acting [29] > pg 7.168 is stuck undersized for 67m, current state > active+undersized+degraded, last acting [22] > pg 7.17f is stuck undersized for 8m, current state > active+undersized+degraded, last acting [29] > pg 7.180 is stuck undersized for 14m, current state > active+undersized+degraded, last acting [21] > pg 7.18e is stuck undersized for 8m, current state > active+undersized+degraded, last acting [29] > pg 7.193 is active+undersized+degraded, acting [16] > pg 7.197 is stuck undersized for 14m, current state > active+undersized+degraded, last acting [21] > pg 7.1a6 is stuck undersized for 8m, current state > active+undersized+degraded, last acting [29] > pg 7.1b7 is stuck undersized for 8m, current state > active+undersized+degraded, last acting [29] > pg 7.1c6 is stuck undersized for 14m, current state > active+undersized+degraded, last acting [22] > pg 7.1ca is stuck undersized for 14m, current state > active+undersized+degraded, last acting [22] > pg 7.1d7 is stuck undersized for 9m, current state > active+undersized+degraded, last acting [22] > pg 7.1df is active+undersized+degraded, acting [21] > pg 7.1e6 is stuck undersized for 10h, current state > active+undersized+degraded, last acting [23] > pg 7.200 is active+undersized+degraded, acting [29] > pg 7.202 is stuck undersized for 7m, current state > active+undersized+degraded, last acting [13] > pg 7.20c is stuck undersized for 10h, current state > active+undersized+degraded, last acting [16] > pg 7.20e is stuck undersized for 47m, current state > active+undersized+degraded, last acting [23] > pg 7.20f is stuck undersized for 47m, current state > active+undersized+degraded, last acting [23] > pg 7.217 is stuck undersized for 7m, current state > active+undersized+degraded, last acting [21] > pg 15.35 is active+undersized+degraded, acting [22] > pg 16.2a is stuck undersized for 10h, current state > active+undersized+degraded, last acting [21] > pg 19.43 is stuck undersized for 31m, current state > active+undersized+degraded, last acting [23] > pg 19.44 is stuck undersized for 8m, current state > active+undersized+degraded, last acting [29] > pg 19.4e is stuck undersized for 14m, current state > active+undersized+degraded, last acting [16] > pg 19.52 is active+undersized+degraded+wait, acting [3] > pg 19.55 is stuck undersized for 25m, current state > active+undersized+degraded, last acting [23] > pg 19.61 is stuck undersized for 25m, current state > active+undersized+degraded, last acting [21] > pg 19.72 is stuck undersized for 31m, current state > active+undersized+degraded, last acting [3] > pg 20.42 is stuck undersized for 7m, current state > active+undersized+degraded, last acting [23] > pg 20.48 is stuck undersized for 67m, current state > active+undersized+degraded, last acting [16] > pg 20.5b is stuck undersized for 21m, current state > active+undersized+degraded, last acting [12] > pg 20.5f is stuck undersized for 10h, current state > active+undersized+degraded, last acting [12] > pg 20.65 is stuck undersized for 10m, current state > active+undersized+degraded, last acting [23] > pg 20.6a is active+undersized+degraded, acting [13] > pg 20.71 is stuck undersized for 31m, current state > active+undersized+degraded, last acting [13] > pg 20.7d is stuck undersized for 7m, current state > active+undersized+degraded, last acting [29] > [WRN] POOL_NEARFULL: 12 pool(s) nearfull > pool '.mgr' is nearfull > pool 'DataStore' is nearfull > pool 'cephfs_data' is nearfull > pool 'cephfs_metadata' is nearfull > pool 'OS' is nearfull > pool 'cloud' is nearfull > pool 'DataStore_2' is nearfull > pool 'DataStore_3' is nearfull > pool 'MGMT' is nearfull > pool 'DataStore_4' is nearfull > pool 'DataStore_5' is nearfull > pool 'fast' is nearfull > [WRN] SLOW_OPS: 255 slow ops, oldest one blocked for 2417 sec, daemons > [osd.10,osd.12,osd.21,osd.22,osd.23] have slow ops. > root@cmt6770:~# ceph -s > cluster: > id: 9319dafb-3408-46cb-9b09-b3d381114545 > health: HEALTH_WARN > 1 filesystem is degraded > 1 MDSs report slow metadata IOs > mon cmt6770 is low on available space > 9 osds down > 1 host (3 osds) down > 5 nearfull osd(s) > Reduced data availability: 866 pgs inactive, 489 pgs down, 5 pgs > incomplete, 60 pgs stale > Low space hindering backfill (add storage if this doesn't resolve > itself): 9 pgs backfill_toofull > Degraded data redundancy: 432856/3408880 objects degraded > (12.698%), 191 pgs degraded, 181 pgs undersized > 12 pool(s) nearfull > 255 slow ops, oldest one blocked for 2422 sec, daemons > [osd.10,osd.12,osd.21,osd.22,osd.23] have slow ops. > > services: > mon: 2 daemons, quorum cmt6770,cmt5923 (age 70m) > mgr: cmt6770(active, since 3h) > mds: 1/1 daemons up, 1 standby > osd: 25 osds: 11 up (since 14s), 20 in (since 9m); 182 remapped pgs > > data: > volumes: 0/1 healthy, 1 recovering > pools: 12 pools, 1589 pgs > objects: 1.70M objects, 6.2 TiB > usage: 9.1 TiB used, 6.2 TiB / 15 TiB avail > pgs: 28.760% pgs unknown > 34.991% pgs not active > 432856/3408880 objects degraded (12.698%) > 388136/3408880 objects misplaced (11.386%) > 466 down > 457 unknown > 209 active+clean > 185 active+undersized+degraded > 157 active+clean+remapped > 62 stale > 20 stale+down > 9 active+undersized+remapped > 6 active+undersized+degraded+remapped+backfill_toofull > 5 incomplete > 3 active+remapped+backfill_toofull > 2 active+clean+scrubbing+deep > 2 active+clean+remapped+scrubbing+deep > 2 down+remapped > 1 stale+creating+down > 1 active+remapped+backfilling > 1 active+remapped+backfill_wait > 1 active+undersized+remapped+wait > > io: > recovery: 12 MiB/s, 3 objects/s > > root@cmt6770:~# ceph osd tree > ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF > -1 34.05125 root default > -3 5.38539 host cmt5923 > 3 ssd 1.74660 osd.3 up 0.79999 1.00000 > 8 ssd 1.81940 osd.8 down 1.00000 1.00000 > 9 ssd 1.81940 osd.9 down 1.00000 1.00000 > -15 4.40289 host cmt6461 > 24 nvme 0.90970 osd.24 down 0.79999 1.00000 > 2 ssd 1.74660 osd.2 down 1.00000 1.00000 > 17 ssd 1.74660 osd.17 down 1.00000 1.00000 > -5 5.35616 host cmt6770 > 0 ssd 0.87329 osd.0 down 1.00000 1.00000 > 1 ssd 0.87329 osd.1 down 1.00000 1.00000 > 4 ssd 1.86299 osd.4 down 0 1.00000 > 14 ssd 0.87329 osd.14 down 1.00000 1.00000 > 15 ssd 0.87329 osd.15 up 1.00000 1.00000 > -9 7.24838 host cmt7773 > 5 nvme 1.81940 osd.5 down 0 1.00000 > 19 nvme 1.81940 osd.19 down 0 1.00000 > 7 ssd 1.74660 osd.7 down 1.00000 1.00000 > 29 ssd 1.86299 osd.29 up 1.00000 1.00000 > -13 7.93245 host dc2943 > 22 nvme 0.90970 osd.22 up 1.00000 1.00000 > 23 nvme 0.90970 osd.23 up 1.00000 1.00000 > 6 ssd 1.74660 osd.6 down 0 1.00000 > 10 ssd 0.87329 osd.10 up 1.00000 1.00000 > 11 ssd 0.87329 osd.11 up 1.00000 1.00000 > 12 ssd 0.87329 osd.12 up 1.00000 1.00000 > 13 ssd 0.87329 osd.13 up 1.00000 1.00000 > 16 ssd 0.87329 osd.16 up 0.79999 1.00000 > -11 3.72598 host dc3658 > 20 nvme 1.86299 osd.20 down 0 1.00000 > 21 nvme 1.86299 osd.21 up 0.90002 1.00000 > root@cmt6770:~# ceph osd df > ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META > AVAIL %USE VAR PGS STATUS > 3 ssd 1.74660 0.79999 1.7 TiB 1.5 TiB 1.5 TiB 228 KiB 2.9 GiB 251 > GiB 85.96 1.44 214 up > 8 ssd 1.81940 1.00000 0 B 0 B 0 B 0 B 0 B > 0 B 0 0 0 down > 9 ssd 1.81940 1.00000 1.8 TiB 776 MiB 745 MiB 8 KiB 31 MiB 1.8 > TiB 0.04 0 46 down > 24 nvme 0.90970 0.79999 0 B 0 B 0 B 0 B 0 B > 0 B 0 0 0 down > 2 ssd 1.74660 1.00000 0 B 0 B 0 B 0 B 0 B > 0 B 0 0 4 down > 17 ssd 1.74660 1.00000 0 B 0 B 0 B 0 B 0 B > 0 B 0 0 0 down > 0 ssd 0.87329 1.00000 0 B 0 B 0 B 0 B 0 B > 0 B 0 0 3 down > 1 ssd 0.87329 1.00000 0 B 0 B 0 B 0 B 0 B > 0 B 0 0 0 down > 4 ssd 1.86299 0 0 B 0 B 0 B 0 B 0 B > 0 B 0 0 0 down > 14 ssd 0.87329 1.00000 894 GiB 792 MiB 752 MiB 14 KiB 40 MiB > 893 GiB 0.09 0.00 25 down > 15 ssd 0.87329 1.00000 894 GiB 232 GiB 231 GiB 14 KiB 1.4 GiB > 662 GiB 25.98 0.44 83 up > 5 nvme 1.81940 0 0 B 0 B 0 B 0 B 0 B > 0 B 0 0 0 down > 19 nvme 1.81940 0 0 B 0 B 0 B 0 B 0 B > 0 B 0 0 0 down > 7 ssd 1.74660 1.00000 0 B 0 B 0 B 0 B 0 B > 0 B 0 0 0 down > 29 ssd 1.86299 1.00000 1.9 TiB 1.5 TiB 1.5 TiB 323 KiB 2.8 GiB > 354 GiB 81.44 1.37 222 up > 22 nvme 0.90970 1.00000 932 GiB 689 GiB 687 GiB 181 KiB 1.6 GiB > 243 GiB 73.96 1.24 139 up > 23 nvme 0.90970 1.00000 932 GiB 820 GiB 818 GiB 138 KiB 2.0 GiB > 112 GiB 87.98 1.48 144 up > 6 ssd 1.74660 0 0 B 0 B 0 B 0 B 0 B > 0 B 0 0 0 down > 10 ssd 0.87329 1.00000 894 GiB 237 GiB 235 GiB 1 KiB 1.2 GiB > 658 GiB 26.46 0.44 82 up > 11 ssd 0.87329 1.00000 894 GiB 264 GiB 263 GiB 1 KiB 1.4 GiB > 630 GiB 29.54 0.50 67 up > 12 ssd 0.87329 1.00000 894 GiB 780 GiB 778 GiB 123 KiB 1.8 GiB > 114 GiB 87.26 1.46 113 up > 13 ssd 0.87329 1.00000 894 GiB 684 GiB 682 GiB 170 KiB 1.9 GiB > 210 GiB 76.48 1.28 98 up > 16 ssd 0.87329 0.79999 894 GiB 779 GiB 777 GiB 149 KiB 1.8 GiB > 116 GiB 87.06 1.46 86 up > 20 nvme 1.86299 0 0 B 0 B 0 B 0 B 0 B > 0 B 0 0 0 down > 21 nvme 1.86299 0.90002 1.9 TiB 1.7 TiB 1.7 TiB 430 KiB 3.5 GiB > 194 GiB 89.84 1.51 314 up > TOTAL 15 TiB 9.1 TiB 9.1 TiB 1.7 MiB 22 GiB 6.2 > TiB 59.60 > MIN/MAX VAR: 0/1.51 STDDEV: 44.89 > > > also osd start logs: > May 10 15:38:32 cmt5923 systemd[1]: ceph-osd@9.service: Failed with result > 'signal'. > May 10 15:38:37 cmt5923 ceph-osd[2383902]: 2025-05-10T15:38:37.579+0300 > 764caf13f880 -1 osd.8 100504 log_to_monitors true > May 10 15:38:37 cmt5923 ceph-osd[2383902]: 2025-05-10T15:38:37.791+0300 > 764c8c64b6c0 -1 log_channel(cluster) log [ERR] : 7.26a past_intervals > [96946,100253) start interval does not contain the required bound > [93903,100253) start > May 10 15:38:37 cmt5923 ceph-osd[2383902]: 2025-05-10T15:38:37.791+0300 > 764c8c64b6c0 -1 osd.8 pg_epoch: 100377 pg[7.26a( empty local-lis/les=0/0 n=0 > ec=96946/96946 lis/c=96236/93898 les/c/f=96237/93903/91308 sis=100253) [3,1] > r=-1 lpr=100376 pi=[96946,100253)/3 crt=0'0 mlcod 0'0 unknown mbc={}] > PeeringState::check_past_interval_bounds 7.26a past_intervals [96946,100253) > start interval does not contain the required bound [93903,100253) start > > We appreciate any support and guidance, > Thanks in advance > _______________________________________________ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io