Hello, After upgrading from ceph reef 18.2.6 to ceph squid 19.2.1 I restarted the osds and they remained down. The events contain the following records:
root@cmt6770:~# ceph health detail HEALTH_WARN 1 filesystem is degraded; 1 MDSs report slow metadata IOs; mon cmt6770 is low on available space; 9 osds down; 1 host (3 osds) down; 5 nearfull osd(s); Reduced data availability: 866 pgs inactive, 489 pgs down, 5 pgs incomplete, 60 pgs stale; Low space hindering backfill (add storage if this doesn't resolve itself): 9 pgs backfill_toofull; Degraded data redundancy: 432856/3408880 objects degraded (12.698%), 191 pgs degraded, 181 pgs undersized; 12 pool(s) nearfull; 255 slow ops, oldest one blocked for 2417 sec, daemons [osd.10,osd.12,osd.21,osd.22,osd.23] have slow ops. [WRN] FS_DEGRADED: 1 filesystem is degraded fs cephfs is degraded [WRN] MDS_SLOW_METADATA_IO: 1 MDSs report slow metadata IOs mds.cmt5923(mds.0): 1 slow metadata IOs are blocked > 30 secs, oldest blocked for 38201 secs [WRN] MON_DISK_LOW: mon cmt6770 is low on available space mon.cmt6770 has 28% avail [WRN] OSD_DOWN: 9 osds down osd.0 (root=default,host=cmt6770) is down osd.1 (root=default,host=cmt6770) is down osd.2 (root=default,host=cmt6461) is down osd.7 (root=default,host=cmt7773) is down osd.8 (root=default,host=cmt5923) is down osd.9 (root=default,host=cmt5923) is down osd.14 (root=default,host=cmt6770) is down osd.17 (root=default,host=cmt6461) is down osd.24 (root=default,host=cmt6461) is down [WRN] OSD_HOST_DOWN: 1 host (3 osds) down host cmt6461 (root=default) (3 osds) is down [WRN] OSD_NEARFULL: 5 nearfull osd(s) osd.3 is near full osd.12 is near full osd.16 is near full osd.21 is near full osd.23 is near full [WRN] PG_AVAILABILITY: Reduced data availability: 866 pgs inactive, 489 pgs down, 5 pgs incomplete, 60 pgs stale pg 7.1c5 is down, acting [3] pg 7.1c7 is stuck inactive for 3h, current state unknown, last acting [] pg 7.1c8 is stuck inactive for 3h, current state unknown, last acting [] pg 7.1cb is stuck inactive for 3h, current state unknown, last acting [] pg 7.1cd is down, acting [15,3] pg 7.1ce is stuck inactive for 3h, current state unknown, last acting [] pg 7.1cf is stuck inactive for 3h, current state unknown, last acting [] pg 7.1d0 is stuck inactive for 3h, current state unknown, last acting [] pg 7.1d1 is down, acting [29,13] pg 7.1d2 is stuck inactive for 3h, current state unknown, last acting [] pg 7.1d3 is down, acting [23] pg 7.1d4 is down, acting [16] pg 7.1d5 is stuck inactive for 3h, current state unknown, last acting [] pg 7.1d6 is down, acting [3] pg 7.1d9 is stuck inactive for 3h, current state unknown, last acting [] pg 7.1da is stuck inactive for 3h, current state unknown, last acting [] pg 7.1e0 is stuck inactive for 3h, current state unknown, last acting [] pg 7.1e1 is stuck inactive for 3h, current state unknown, last acting [] pg 7.1e2 is stuck inactive for 3h, current state unknown, last acting [] pg 7.1e4 is stuck inactive for 3h, current state unknown, last acting [] pg 7.1e5 is down, acting [12,29] pg 7.1e7 is stuck stale for 7m, current state stale+down, last acting [9] pg 7.1e8 is down, acting [12] pg 7.1e9 is stuck stale for 31m, current state stale, last acting [2] pg 7.1eb is stuck inactive for 3h, current state unknown, last acting [] pg 7.1ed is down, acting [3] pg 7.1ee is stuck inactive for 3h, current state unknown, last acting [] pg 7.1ef is down, acting [12] pg 7.1f0 is down, acting [10] pg 7.1f1 is down, acting [12,29] pg 7.1f2 is down, acting [16] pg 7.1f3 is stuck stale for 7m, current state stale, last acting [9] pg 7.1f4 is down, acting [22] pg 7.1f5 is down, acting [22] pg 7.1f8 is down, acting [29] pg 7.1f9 is stuck inactive for 3h, current state unknown, last acting [] pg 7.1fb is stuck inactive for 3h, current state unknown, last acting [] pg 7.1fc is stuck stale for 15m, current state stale+down, last acting [29] pg 7.1fd is down, acting [3] pg 7.1fe is down, acting [3,15] pg 7.1ff is down, acting [3] pg 7.201 is down, acting [12] pg 7.204 is down, acting [10] pg 7.205 is down, acting [13] pg 7.207 is down, acting [11] pg 7.20a is down, acting [3] pg 7.20b is down, acting [22] pg 7.20d is stuck inactive for 3h, current state unknown, last acting [] pg 7.210 is stuck inactive for 3h, current state unknown, last acting [] pg 7.211 is stuck inactive for 3h, current state unknown, last acting [] pg 7.21b is down, acting [16] [WRN] PG_BACKFILL_FULL: Low space hindering backfill (add storage if this doesn't resolve itself): 9 pgs backfill_toofull pg 7.20 is active+undersized+degraded+remapped+backfill_toofull, acting [3] pg 19.2b is active+remapped+backfill_toofull, acting [12,3] pg 19.6b is active+remapped+backfill_toofull, acting [12,3] pg 20.55 is active+remapped+backfill_toofull, acting [29,3] pg 24.6 is active+undersized+degraded+remapped+backfill_toofull, acting [22] pg 24.b is active+undersized+degraded+remapped+backfill_toofull, acting [21] pg 24.13 is active+undersized+degraded+remapped+backfill_toofull, acting [23] pg 24.16 is active+undersized+degraded+remapped+backfill_toofull, acting [21] pg 24.1d is active+undersized+degraded+remapped+backfill_toofull, acting [21] [WRN] PG_DEGRADED: Degraded data redundancy: 432856/3408880 objects degraded (12.698%), 191 pgs degraded, 181 pgs undersized pg 7.3f is stuck undersized for 9m, current state active+undersized+degraded, last acting [16] pg 7.56 is stuck undersized for 21m, current state active+undersized+degraded, last acting [13] pg 7.61 is stuck undersized for 8m, current state active+undersized+degraded, last acting [29] pg 7.66 is stuck undersized for 7m, current state active+undersized+degraded, last acting [29] pg 7.6b is stuck undersized for 25m, current state active+undersized+degraded, last acting [21] pg 7.102 is active+undersized+degraded, acting [23] pg 7.118 is stuck undersized for 14m, current state active+undersized+degraded, last acting [23] pg 7.11c is stuck undersized for 9m, current state active+undersized+degraded, last acting [3] pg 7.133 is stuck undersized for 7m, current state active+undersized+degraded, last acting [12] pg 7.139 is stuck undersized for 7m, current state active+undersized+degraded, last acting [13] pg 7.143 is stuck undersized for 8m, current state active+undersized+degraded, last acting [29] pg 7.155 is stuck undersized for 25m, current state active+undersized+degraded, last acting [3] pg 7.156 is active+undersized+degraded, acting [12] pg 7.15e is stuck undersized for 31m, current state active+undersized+degraded, last acting [3] pg 7.15f is stuck undersized for 8m, current state active+undersized+degraded, last acting [29] pg 7.168 is stuck undersized for 67m, current state active+undersized+degraded, last acting [22] pg 7.17f is stuck undersized for 8m, current state active+undersized+degraded, last acting [29] pg 7.180 is stuck undersized for 14m, current state active+undersized+degraded, last acting [21] pg 7.18e is stuck undersized for 8m, current state active+undersized+degraded, last acting [29] pg 7.193 is active+undersized+degraded, acting [16] pg 7.197 is stuck undersized for 14m, current state active+undersized+degraded, last acting [21] pg 7.1a6 is stuck undersized for 8m, current state active+undersized+degraded, last acting [29] pg 7.1b7 is stuck undersized for 8m, current state active+undersized+degraded, last acting [29] pg 7.1c6 is stuck undersized for 14m, current state active+undersized+degraded, last acting [22] pg 7.1ca is stuck undersized for 14m, current state active+undersized+degraded, last acting [22] pg 7.1d7 is stuck undersized for 9m, current state active+undersized+degraded, last acting [22] pg 7.1df is active+undersized+degraded, acting [21] pg 7.1e6 is stuck undersized for 10h, current state active+undersized+degraded, last acting [23] pg 7.200 is active+undersized+degraded, acting [29] pg 7.202 is stuck undersized for 7m, current state active+undersized+degraded, last acting [13] pg 7.20c is stuck undersized for 10h, current state active+undersized+degraded, last acting [16] pg 7.20e is stuck undersized for 47m, current state active+undersized+degraded, last acting [23] pg 7.20f is stuck undersized for 47m, current state active+undersized+degraded, last acting [23] pg 7.217 is stuck undersized for 7m, current state active+undersized+degraded, last acting [21] pg 15.35 is active+undersized+degraded, acting [22] pg 16.2a is stuck undersized for 10h, current state active+undersized+degraded, last acting [21] pg 19.43 is stuck undersized for 31m, current state active+undersized+degraded, last acting [23] pg 19.44 is stuck undersized for 8m, current state active+undersized+degraded, last acting [29] pg 19.4e is stuck undersized for 14m, current state active+undersized+degraded, last acting [16] pg 19.52 is active+undersized+degraded+wait, acting [3] pg 19.55 is stuck undersized for 25m, current state active+undersized+degraded, last acting [23] pg 19.61 is stuck undersized for 25m, current state active+undersized+degraded, last acting [21] pg 19.72 is stuck undersized for 31m, current state active+undersized+degraded, last acting [3] pg 20.42 is stuck undersized for 7m, current state active+undersized+degraded, last acting [23] pg 20.48 is stuck undersized for 67m, current state active+undersized+degraded, last acting [16] pg 20.5b is stuck undersized for 21m, current state active+undersized+degraded, last acting [12] pg 20.5f is stuck undersized for 10h, current state active+undersized+degraded, last acting [12] pg 20.65 is stuck undersized for 10m, current state active+undersized+degraded, last acting [23] pg 20.6a is active+undersized+degraded, acting [13] pg 20.71 is stuck undersized for 31m, current state active+undersized+degraded, last acting [13] pg 20.7d is stuck undersized for 7m, current state active+undersized+degraded, last acting [29] [WRN] POOL_NEARFULL: 12 pool(s) nearfull pool '.mgr' is nearfull pool 'DataStore' is nearfull pool 'cephfs_data' is nearfull pool 'cephfs_metadata' is nearfull pool 'OS' is nearfull pool 'cloud' is nearfull pool 'DataStore_2' is nearfull pool 'DataStore_3' is nearfull pool 'MGMT' is nearfull pool 'DataStore_4' is nearfull pool 'DataStore_5' is nearfull pool 'fast' is nearfull [WRN] SLOW_OPS: 255 slow ops, oldest one blocked for 2417 sec, daemons [osd.10,osd.12,osd.21,osd.22,osd.23] have slow ops. root@cmt6770:~# ceph -s cluster: id: 9319dafb-3408-46cb-9b09-b3d381114545 health: HEALTH_WARN 1 filesystem is degraded 1 MDSs report slow metadata IOs mon cmt6770 is low on available space 9 osds down 1 host (3 osds) down 5 nearfull osd(s) Reduced data availability: 866 pgs inactive, 489 pgs down, 5 pgs incomplete, 60 pgs stale Low space hindering backfill (add storage if this doesn't resolve itself): 9 pgs backfill_toofull Degraded data redundancy: 432856/3408880 objects degraded (12.698%), 191 pgs degraded, 181 pgs undersized 12 pool(s) nearfull 255 slow ops, oldest one blocked for 2422 sec, daemons [osd.10,osd.12,osd.21,osd.22,osd.23] have slow ops. services: mon: 2 daemons, quorum cmt6770,cmt5923 (age 70m) mgr: cmt6770(active, since 3h) mds: 1/1 daemons up, 1 standby osd: 25 osds: 11 up (since 14s), 20 in (since 9m); 182 remapped pgs data: volumes: 0/1 healthy, 1 recovering pools: 12 pools, 1589 pgs objects: 1.70M objects, 6.2 TiB usage: 9.1 TiB used, 6.2 TiB / 15 TiB avail pgs: 28.760% pgs unknown 34.991% pgs not active 432856/3408880 objects degraded (12.698%) 388136/3408880 objects misplaced (11.386%) 466 down 457 unknown 209 active+clean 185 active+undersized+degraded 157 active+clean+remapped 62 stale 20 stale+down 9 active+undersized+remapped 6 active+undersized+degraded+remapped+backfill_toofull 5 incomplete 3 active+remapped+backfill_toofull 2 active+clean+scrubbing+deep 2 active+clean+remapped+scrubbing+deep 2 down+remapped 1 stale+creating+down 1 active+remapped+backfilling 1 active+remapped+backfill_wait 1 active+undersized+remapped+wait io: recovery: 12 MiB/s, 3 objects/s root@cmt6770:~# ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 34.05125 root default -3 5.38539 host cmt5923 3 ssd 1.74660 osd.3 up 0.79999 1.00000 8 ssd 1.81940 osd.8 down 1.00000 1.00000 9 ssd 1.81940 osd.9 down 1.00000 1.00000 -15 4.40289 host cmt6461 24 nvme 0.90970 osd.24 down 0.79999 1.00000 2 ssd 1.74660 osd.2 down 1.00000 1.00000 17 ssd 1.74660 osd.17 down 1.00000 1.00000 -5 5.35616 host cmt6770 0 ssd 0.87329 osd.0 down 1.00000 1.00000 1 ssd 0.87329 osd.1 down 1.00000 1.00000 4 ssd 1.86299 osd.4 down 0 1.00000 14 ssd 0.87329 osd.14 down 1.00000 1.00000 15 ssd 0.87329 osd.15 up 1.00000 1.00000 -9 7.24838 host cmt7773 5 nvme 1.81940 osd.5 down 0 1.00000 19 nvme 1.81940 osd.19 down 0 1.00000 7 ssd 1.74660 osd.7 down 1.00000 1.00000 29 ssd 1.86299 osd.29 up 1.00000 1.00000 -13 7.93245 host dc2943 22 nvme 0.90970 osd.22 up 1.00000 1.00000 23 nvme 0.90970 osd.23 up 1.00000 1.00000 6 ssd 1.74660 osd.6 down 0 1.00000 10 ssd 0.87329 osd.10 up 1.00000 1.00000 11 ssd 0.87329 osd.11 up 1.00000 1.00000 12 ssd 0.87329 osd.12 up 1.00000 1.00000 13 ssd 0.87329 osd.13 up 1.00000 1.00000 16 ssd 0.87329 osd.16 up 0.79999 1.00000 -11 3.72598 host dc3658 20 nvme 1.86299 osd.20 down 0 1.00000 21 nvme 1.86299 osd.21 up 0.90002 1.00000 root@cmt6770:~# ceph osd df ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS 3 ssd 1.74660 0.79999 1.7 TiB 1.5 TiB 1.5 TiB 228 KiB 2.9 GiB 251 GiB 85.96 1.44 214 up 8 ssd 1.81940 1.00000 0 B 0 B 0 B 0 B 0 B 0 B 0 0 0 down 9 ssd 1.81940 1.00000 1.8 TiB 776 MiB 745 MiB 8 KiB 31 MiB 1.8 TiB 0.04 0 46 down 24 nvme 0.90970 0.79999 0 B 0 B 0 B 0 B 0 B 0 B 0 0 0 down 2 ssd 1.74660 1.00000 0 B 0 B 0 B 0 B 0 B 0 B 0 0 4 down 17 ssd 1.74660 1.00000 0 B 0 B 0 B 0 B 0 B 0 B 0 0 0 down 0 ssd 0.87329 1.00000 0 B 0 B 0 B 0 B 0 B 0 B 0 0 3 down 1 ssd 0.87329 1.00000 0 B 0 B 0 B 0 B 0 B 0 B 0 0 0 down 4 ssd 1.86299 0 0 B 0 B 0 B 0 B 0 B 0 B 0 0 0 down 14 ssd 0.87329 1.00000 894 GiB 792 MiB 752 MiB 14 KiB 40 MiB 893 GiB 0.09 0.00 25 down 15 ssd 0.87329 1.00000 894 GiB 232 GiB 231 GiB 14 KiB 1.4 GiB 662 GiB 25.98 0.44 83 up 5 nvme 1.81940 0 0 B 0 B 0 B 0 B 0 B 0 B 0 0 0 down 19 nvme 1.81940 0 0 B 0 B 0 B 0 B 0 B 0 B 0 0 0 down 7 ssd 1.74660 1.00000 0 B 0 B 0 B 0 B 0 B 0 B 0 0 0 down 29 ssd 1.86299 1.00000 1.9 TiB 1.5 TiB 1.5 TiB 323 KiB 2.8 GiB 354 GiB 81.44 1.37 222 up 22 nvme 0.90970 1.00000 932 GiB 689 GiB 687 GiB 181 KiB 1.6 GiB 243 GiB 73.96 1.24 139 up 23 nvme 0.90970 1.00000 932 GiB 820 GiB 818 GiB 138 KiB 2.0 GiB 112 GiB 87.98 1.48 144 up 6 ssd 1.74660 0 0 B 0 B 0 B 0 B 0 B 0 B 0 0 0 down 10 ssd 0.87329 1.00000 894 GiB 237 GiB 235 GiB 1 KiB 1.2 GiB 658 GiB 26.46 0.44 82 up 11 ssd 0.87329 1.00000 894 GiB 264 GiB 263 GiB 1 KiB 1.4 GiB 630 GiB 29.54 0.50 67 up 12 ssd 0.87329 1.00000 894 GiB 780 GiB 778 GiB 123 KiB 1.8 GiB 114 GiB 87.26 1.46 113 up 13 ssd 0.87329 1.00000 894 GiB 684 GiB 682 GiB 170 KiB 1.9 GiB 210 GiB 76.48 1.28 98 up 16 ssd 0.87329 0.79999 894 GiB 779 GiB 777 GiB 149 KiB 1.8 GiB 116 GiB 87.06 1.46 86 up 20 nvme 1.86299 0 0 B 0 B 0 B 0 B 0 B 0 B 0 0 0 down 21 nvme 1.86299 0.90002 1.9 TiB 1.7 TiB 1.7 TiB 430 KiB 3.5 GiB 194 GiB 89.84 1.51 314 up TOTAL 15 TiB 9.1 TiB 9.1 TiB 1.7 MiB 22 GiB 6.2 TiB 59.60 MIN/MAX VAR: 0/1.51 STDDEV: 44.89 also osd start logs: May 10 15:38:32 cmt5923 systemd[1]: ceph-osd@9.service: Failed with result 'signal'. May 10 15:38:37 cmt5923 ceph-osd[2383902]: 2025-05-10T15:38:37.579+0300 764caf13f880 -1 osd.8 100504 log_to_monitors true May 10 15:38:37 cmt5923 ceph-osd[2383902]: 2025-05-10T15:38:37.791+0300 764c8c64b6c0 -1 log_channel(cluster) log [ERR] : 7.26a past_intervals [96946,100253) start interval does not contain the required bound [93903,100253) start May 10 15:38:37 cmt5923 ceph-osd[2383902]: 2025-05-10T15:38:37.791+0300 764c8c64b6c0 -1 osd.8 pg_epoch: 100377 pg[7.26a( empty local-lis/les=0/0 n=0 ec=96946/96946 lis/c=96236/93898 les/c/f=96237/93903/91308 sis=100253) [3,1] r=-1 lpr=100376 pi=[96946,100253)/3 crt=0'0 mlcod 0'0 unknown mbc={}] PeeringState::check_past_interval_bounds 7.26a past_intervals [96946,100253) start interval does not contain the required bound [93903,100253) start We appreciate any support and guidance, Thanks in advance _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io