Hi Dev,

Not sure to understand why there was these service deployment time-out, the log says that one OSD failed, this may explain that that the upgrade is not progressing anymore. The Bluestore slow ops (a new warning so not necessarily something new) on so many OSD seem to suggest that there something not optimal. As suggested in another thread recently it may be an indication that you need to compact OSD.

I am not sure what you adjusted but as long as the cluster works, I would not have changed parameters and try to fix the mentioned problems.

Good luck,

Michel
Sent from my mobile
Le 20 juin 2025 05:13:38 Devender Singh <deven...@netskrt.io> a écrit :

Here is the status

# ceph orch upgrade status
{
"target_image": "quay.io/ceph/ceph@sha256:8214ebff6133ac27d20659038df6962dbf9d77da21c9438a296b2e2059a56af6",
   "in_progress": true,
   "which": "Upgrading all daemon types on all hosts",
   "services_complete": [
       "crash",
       "mgr",
       "mon"
   ],
   "progress": "74/113 daemons upgraded",
   "message": "",
   "is_paused": false
}

Regards
Dev

On Jun 19, 2025, at 8:06 PM, Devender Singh <deven...@netskrt.io> wrote:


Hello all

I have a cluster where my cluster is in hung state, Some back fills are there but I reduced it to 1 but still upgrade not progressing…
Please help…

```# ceph health detail
HEALTH_WARN 8 OSD(s) experiencing slow operations in BlueStore; Failed to apply 2 service(s): osd.all-available-devices,osd.iops_optimized; 1 failed cephadm daemon(s); failed to probe daemons or devices; noscrub,nodeep-scrub flag(s) set; Degraded data redundancy: 1600150/39365198 objects degraded (4.065%), 93 pgs degraded, 103 pgs undersized; 127 pgs not deep-scrubbed in time [WRN] BLUESTORE_SLOW_OP_ALERT: 8 OSD(s) experiencing slow operations in BlueStore
osd.5 observed slow operation indications in BlueStore
osd.9 observed slow operation indications in BlueStore
osd.18 observed slow operation indications in BlueStore
osd.36 observed slow operation indications in BlueStore
osd.59 observed slow operation indications in BlueStore
osd.66 observed slow operation indications in BlueStore
osd.106 observed slow operation indications in BlueStore
osd.110 observed slow operation indications in BlueStore
[WRN] CEPHADM_APPLY_SPEC_FAIL: Failed to apply 2 service(s): osd.all-available-devices,osd.iops_optimized osd.all-available-devices: Command timed out on host cephadm deploy (osd daemon) (default 900 second timeout) osd.iops_optimized: Command timed out on host cephadm deploy (osd daemon) (default 900 second timeout)
[WRN] CEPHADM_FAILED_DAEMON: 1 failed cephadm daemon(s)
daemon osd.19 on phl-prod-host04n.example.comis in error state
[WRN] CEPHADM_REFRESH_FAILED: failed to probe daemons or devices
Command "cephadm ceph-volume -- inventory" timed out on host phl-prod-converged03n.phl.netskrt.org (default 900 second timeout)
[WRN] OSDMAP_FLAGS: noscrub,nodeep-scrub flag(s) set
[WRN] PG_DEGRADED: Degraded data redundancy: 1600150/39365198 objects degraded (4.065%), 93 pgs degraded, 103 pgs undersized pg 25.0 is stuck undersized for 21h, current state active+undersized+degraded+remapped+backfill_wait, last acting [90,77,NONE,38,3] pg 25.1 is stuck undersized for 23h, current state active+undersized+degraded+remapped+backfill_wait, last acting [131,1,66,NONE,72] pg 25.c is stuck undersized for 23h, current state active+undersized+degraded+remapped+backfill_wait, last acting [1,135,110,28,NONE] pg 25.e is active+undersized+degraded+remapped+backfill_wait, acting [18,20,108,101,NONE] pg 25.14 is stuck undersized for 19h, current state active+undersized+degraded+remapped+backfill_wait, last acting [32,13,NONE,65,97] pg 25.17 is stuck undersized for 19h, current state active+undersized+degraded+remapped+backfill_wait, last acting [32,65,110,23,NONE] pg 25.1a is stuck undersized for 23h, current state active+undersized+degraded+remapped+backfill_wait, last acting [97,66,NONE,112,139] pg 25.1c is stuck undersized for 2h, current state active+undersized+degraded+remapped+backfill_wait, last acting [102,136,97,NONE,45] pg 25.1f is stuck undersized for 8h, current state active+undersized+degraded+remapped+backfill_wait, last acting [NONE,66,7,39,109] pg 26.45 is stuck undersized for 23h, current state active+undersized+degraded+remapped+backfill_wait, last acting [32,47] pg 26.48 is stuck undersized for 23h, current state active+undersized+degraded+remapped+backfill_wait, last acting [113,55] pg 26.4b is stuck undersized for 8h, current state active+undersized+degraded+remapped+backfill_wait, last acting [105,7] pg 26.59 is stuck undersized for 23h, current state active+undersized+degraded+remapped+backfill_wait, last acting [35,40] pg 26.6a is stuck undersized for 7h, current state active+undersized+degraded+remapped+backfill_wait, last acting [25,45] pg 26.74 is stuck undersized for 23h, current state active+undersized+degraded+remapped+backfill_wait, last acting [112,131] pg 26.77 is stuck undersized for 2h, current state active+undersized+degraded+remapped+backfill_wait, last acting [17,105] pg 26.9c is stuck undersized for 19h, current state active+undersized+degraded+remapped+backfill_wait, last acting [47,76] pg 26.bf is stuck undersized for 4m, current state active+undersized+degraded+remapped+backfill_wait, last acting [32,26] pg 26.c2 is stuck undersized for 2h, current state active+undersized+degraded+remapped+backfill_wait, last acting [97,135] pg 26.ec is stuck undersized for 4m, current state active+undersized+degraded+remapped+backfill_wait, last acting [26,80] pg 26.f7 is stuck undersized for 4m, current state active+undersized+degraded+remapped+backfill_wait, last acting [49,105] pg 31.12 is stuck undersized for 4m, current state active+undersized+degraded+remapped+backfill_wait, last acting [47,110,131,37,NONE] pg 31.19 is stuck undersized for 2h, current state active+undersized+degraded+remapped+backfill_wait, last acting [NONE,87,3,137,39] pg 31.1b is stuck undersized for 2h, current state active+undersized+degraded+remapped+backfill_wait, last acting [108,NONE,42,102,97] pg 31.1c is stuck undersized for 19h, current state active+undersized+degraded+remapped+backfill_wait, last acting [113,66,NONE,45,72] pg 31.41 is stuck undersized for 2h, current state active+undersized+remapped+backfill_wait, last acting [14,101,NONE,67,18] pg 31.42 is stuck undersized for 7h, current state active+undersized+degraded+remapped+backfill_wait, last acting [37,134,NONE,82,8] pg 31.44 is stuck undersized for 4m, current state active+undersized+degraded+remapped+backfill_wait, last acting [49,NONE,101,3,62] pg 31.46 is stuck undersized for 4m, current state active+undersized+degraded+remapped+backfilling, last acting [95,38,NONE,25,102] pg 31.47 is stuck undersized for 18h, current state active+undersized+degraded+remapped+backfill_wait, last acting [60,NONE,130,72,110] pg 31.4a is stuck undersized for 4h, current state active+undersized+degraded+remapped+backfilling, last acting [66,NONE,135,82,14] pg 31.4c is stuck undersized for 4m, current state active+undersized+degraded+remapped+backfill_wait, last acting [34,NONE,1,82,18] pg 31.4d is stuck undersized for 2h, current state active+undersized+degraded+remapped+backfill_wait, last acting [101,13,65,30,NONE] pg 31.52 is stuck undersized for 4m, current state active+undersized+remapped+backfill_wait, last acting [26,112,66,NONE,135] pg 31.53 is stuck undersized for 17h, current state active+undersized+degraded+remapped+backfill_wait, last acting [NONE,24,11,42,110] pg 31.55 is stuck undersized for 4m, current state active+undersized+degraded+remapped+backfill_wait, last acting [116,NONE,27,4,117] pg 31.57 is stuck undersized for 4m, current state active+undersized+degraded+remapped+backfill_wait, last acting [17,15,110,NONE,1] pg 31.5a is stuck undersized for 19h, current state active+undersized+degraded+remapped+backfill_wait, last acting [13,24,124,26,NONE] pg 31.5b is stuck undersized for 4m, current state active+undersized+degraded+remapped+backfill_wait, last acting [NONE,31,8,27,42] pg 31.5c is stuck undersized for 2h, current state active+undersized+degraded+remapped+backfill_wait, last acting [NONE,60,28,18,119] pg 31.5d is stuck undersized for 17h, current state active+undersized+degraded+remapped+backfill_wait, last acting [124,NONE,85,6,11] pg 31.5e is stuck undersized for 2h, current state active+undersized+degraded+remapped+backfill_wait, last acting [NONE,4,1,23,138] pg 31.60 is stuck undersized for 18h, current state active+undersized+degraded+remapped+backfilling, last acting [37,11,102,NONE,133] pg 31.64 is stuck undersized for 4m, current state active+undersized+remapped+backfill_wait, last acting [26,106,45,34,NONE] pg 31.65 is stuck undersized for 4h, current state active+undersized+degraded+remapped+backfill_wait, last acting [NONE,14,127,62,3] pg 31.66 is stuck undersized for 4m, current state active+undersized+degraded+remapped+backfill_wait, last acting [34,NONE,35,23,59] pg 31.67 is stuck undersized for 4m, current state active+undersized+degraded+remapped+backfill_wait, last acting [67,24,127,8,NONE] pg 31.68 is stuck undersized for 4m, current state active+undersized+degraded+remapped+backfill_wait, last acting [32,41,18,17,NONE] pg 31.70 is stuck undersized for 19h, current state active+undersized+degraded+remapped+backfilling, last acting [106,38,1,NONE,97] pg 31.75 is stuck undersized for 2h, current state active+undersized+degraded+remapped+backfill_wait, last acting [26,31,34,8,NONE] pg 31.7f is stuck undersized for 2h, current state active+undersized+remapped+backfilling, last acting [NONE,101,110,10,45]
[WRN] PG_NOT_DEEP_SCRUBBED: 127 pgs not deep-scrubbed in time
pg 26.e5 not deep-scrubbed since 2025-06-07T05:59:22.612553+0000
pg 26.ca not deep-scrubbed since 2025-06-06T18:21:07.435823+0000
pg 26.bf not deep-scrubbed since 2025-06-03T07:29:02.791095+0000
pg 26.be not deep-scrubbed since 2025-06-04T07:55:54.389714+0000
pg 26.a5 not deep-scrubbed since 2025-06-07T07:28:56.878096+0000
pg 26.85 not deep-scrubbed since 2025-06-03T15:18:59.395929+0000
pg 26.7f not deep-scrubbed since 2025-06-04T21:29:28.412637+0000
pg 26.7e not deep-scrubbed since 2025-06-03T09:14:19.585388+0000
pg 26.7d not deep-scrubbed since 2025-06-03T18:37:30.931020+0000
pg 26.7c not deep-scrubbed since 2025-06-04T13:38:00.061488+0000
pg 26.73 not deep-scrubbed since 2025-06-03T06:20:15.111819+0000
pg 26.6f not deep-scrubbed since 2025-06-03T13:45:24.880397+0000
pg 26.6e not deep-scrubbed since 2025-05-26T23:15:32.099862+0000
pg 26.6d not deep-scrubbed since 2025-06-04T14:04:10.449101+0000
pg 31.62 not deep-scrubbed since 2025-06-03T13:34:49.518456+0000
pg 26.65 not deep-scrubbed since 2025-06-04T07:56:25.353411+0000
pg 31.66 not deep-scrubbed since 2025-06-03T10:32:05.364424+0000
pg 26.62 not deep-scrubbed since 2025-06-04T09:35:58.267976+0000
pg 31.65 not deep-scrubbed since 2025-06-03T16:04:40.003140+0000
pg 31.5b not deep-scrubbed since 2025-06-03T14:18:18.835477+0000
pg 26.5d not deep-scrubbed since 2025-06-04T15:14:30.870252+0000
pg 31.58 not deep-scrubbed since 2025-06-03T03:09:27.568605+0000
pg 26.5c not deep-scrubbed since 2025-06-03T01:57:27.644129+0000
pg 31.5f not deep-scrubbed since 2025-06-03T05:53:20.860393+0000
pg 31.52 not deep-scrubbed since 2025-05-27T00:01:27.040861+0000
pg 31.53 not deep-scrubbed since 2025-05-24T09:37:58.964829+0000
pg 26.55 not deep-scrubbed since 2025-06-04T21:25:34.135356+0000
pg 26.54 not deep-scrubbed since 2025-06-04T06:07:12.978734+0000
pg 31.56 not deep-scrubbed since 2025-06-04T12:58:17.599712+0000
pg 31.57 not deep-scrubbed since 2025-06-03T07:02:16.859990+0000
pg 26.51 not deep-scrubbed since 2025-06-03T05:42:22.435483+0000
pg 26.4f not deep-scrubbed since 2025-06-03T09:10:22.617328+0000
pg 31.4a not deep-scrubbed since 2025-05-28T00:54:55.246532+0000
pg 26.4e not deep-scrubbed since 2025-06-03T11:16:49.278513+0000
pg 31.4b not deep-scrubbed since 2025-06-03T10:24:24.123351+0000
pg 26.4d not deep-scrubbed since 2025-06-04T19:01:44.614410+0000
pg 31.49 not deep-scrubbed since 2025-05-28T04:56:29.368285+0000
pg 31.42 not deep-scrubbed since 2025-05-28T08:38:57.151865+0000
pg 26.41 not deep-scrubbed since 2025-06-03T05:47:35.443867+0000
pg 26.40 not deep-scrubbed since 2025-06-03T05:43:13.283668+0000
pg 25.17 not deep-scrubbed since 2025-06-03T09:26:52.253625+0000
pg 32.2e not deep-scrubbed since 2025-06-05T13:01:06.175389+0000
pg 22.1a not deep-scrubbed since 2025-06-07T01:40:45.063268+0000
pg 31.13 not deep-scrubbed since 2025-06-03T12:21:17.965218+0000
pg 22.1b not deep-scrubbed since 2025-06-04T12:22:44.947751+0000
pg 25.14 not deep-scrubbed since 2025-05-28T06:26:32.552200+0000
pg 26.17 not deep-scrubbed since 2025-06-03T18:37:26.617483+0000
pg 31.12 not deep-scrubbed since 2025-05-28T08:39:23.271194+0000
pg 31.1c not deep-scrubbed since 2025-06-03T08:17:51.230187+0000
pg 25.1f not deep-scrubbed since 2025-05-28T00:19:23.653883+0000
77 more pgs…

Regards
Dev

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to