Dev,
To add a few more details to my previous answer, you may want to try the
following things to see what is causing the upgrade to pause (stop
progressing) :
- ceph -s --watch-debug: may give you a hint to something
- ceph log last 1000 debug cephadm: will show you anything reported by
cephadm and may be you may see something related to the upgrade even
though my experience is that in this kind of situations, cephadm is not
very verbose. You may want to cancel the upgrade (ceph orch upgrade
cancel) and restart it (should be harmless).
- Search logs for what happens to osd.19, the OSD in error and if there
is nothing related to a HW problem, try to restart it. The OSD down is
probably the blocking factor and you may want to see in the logs if it
was an OSD being upgraded (i.e. if it fails during the restart done by
the upgrade process).
Michel
Le 20/06/2025 à 07:57, Michel Jouvin a écrit :
Hi Dev,
Not sure to understand why there was these service deployment
time-out, the log says that one OSD failed, this may explain that that
the upgrade is not progressing anymore. The Bluestore slow ops (a new
warning so not necessarily something new) on so many OSD seem to
suggest that there something not optimal. As suggested in another
thread recently it may be an indication that you need to compact OSD.
I am not sure what you adjusted but as long as the cluster works, I
would not have changed parameters and try to fix the mentioned problems.
Good luck,
Michel
Sent from my mobile
Le 20 juin 2025 05:13:38 Devender Singh <deven...@netskrt.io> a écrit :
Here is the status
# ceph orch upgrade status
{
"target_image":
"quay.io/ceph/ceph@sha256:8214ebff6133ac27d20659038df6962dbf9d77da21c9438a296b2e2059a56af6",
"in_progress": true,
"which": "Upgrading all daemon types on all hosts",
"services_complete": [
"crash",
"mgr",
"mon"
],
"progress": "74/113 daemons upgraded",
"message": "",
"is_paused": false
}
Regards
Dev
On Jun 19, 2025, at 8:06 PM, Devender Singh <deven...@netskrt.io>
wrote:
Hello all
I have a cluster where my cluster is in hung state, Some back fills
are there but I reduced it to 1 but still upgrade not progressing…
Please help…
```# ceph health detail
HEALTH_WARN 8 OSD(s) experiencing slow operations in BlueStore;
Failed to apply 2 service(s):
osd.all-available-devices,osd.iops_optimized; 1 failed cephadm
daemon(s); failed to probe daemons or devices; noscrub,nodeep-scrub
flag(s) set; Degraded data redundancy: 1600150/39365198 objects
degraded (4.065%), 93 pgs degraded, 103 pgs undersized; 127 pgs not
deep-scrubbed in time
[WRN] BLUESTORE_SLOW_OP_ALERT: 8 OSD(s) experiencing slow operations
in BlueStore
osd.5 observed slow operation indications in BlueStore
osd.9 observed slow operation indications in BlueStore
osd.18 observed slow operation indications in BlueStore
osd.36 observed slow operation indications in BlueStore
osd.59 observed slow operation indications in BlueStore
osd.66 observed slow operation indications in BlueStore
osd.106 observed slow operation indications in BlueStore
osd.110 observed slow operation indications in BlueStore
[WRN] CEPHADM_APPLY_SPEC_FAIL: Failed to apply 2 service(s):
osd.all-available-devices,osd.iops_optimized
osd.all-available-devices: Command timed out on host cephadm deploy
(osd daemon) (default 900 second timeout)
osd.iops_optimized: Command timed out on host cephadm deploy (osd
daemon) (default 900 second timeout)
[WRN] CEPHADM_FAILED_DAEMON: 1 failed cephadm daemon(s)
daemon osd.19 on phl-prod-host04n.example.comis in error state
[WRN] CEPHADM_REFRESH_FAILED: failed to probe daemons or devices
Command "cephadm ceph-volume -- inventory" timed out on host
phl-prod-converged03n.phl.netskrt.org (default 900 second timeout)
[WRN] OSDMAP_FLAGS: noscrub,nodeep-scrub flag(s) set
[WRN] PG_DEGRADED: Degraded data redundancy: 1600150/39365198
objects degraded (4.065%), 93 pgs degraded, 103 pgs undersized
pg 25.0 is stuck undersized for 21h, current state
active+undersized+degraded+remapped+backfill_wait, last acting
[90,77,NONE,38,3]
pg 25.1 is stuck undersized for 23h, current state
active+undersized+degraded+remapped+backfill_wait, last acting
[131,1,66,NONE,72]
pg 25.c is stuck undersized for 23h, current state
active+undersized+degraded+remapped+backfill_wait, last acting
[1,135,110,28,NONE]
pg 25.e is active+undersized+degraded+remapped+backfill_wait, acting
[18,20,108,101,NONE]
pg 25.14 is stuck undersized for 19h, current state
active+undersized+degraded+remapped+backfill_wait, last acting
[32,13,NONE,65,97]
pg 25.17 is stuck undersized for 19h, current state
active+undersized+degraded+remapped+backfill_wait, last acting
[32,65,110,23,NONE]
pg 25.1a is stuck undersized for 23h, current state
active+undersized+degraded+remapped+backfill_wait, last acting
[97,66,NONE,112,139]
pg 25.1c is stuck undersized for 2h, current state
active+undersized+degraded+remapped+backfill_wait, last acting
[102,136,97,NONE,45]
pg 25.1f is stuck undersized for 8h, current state
active+undersized+degraded+remapped+backfill_wait, last acting
[NONE,66,7,39,109]
pg 26.45 is stuck undersized for 23h, current state
active+undersized+degraded+remapped+backfill_wait, last acting [32,47]
pg 26.48 is stuck undersized for 23h, current state
active+undersized+degraded+remapped+backfill_wait, last acting [113,55]
pg 26.4b is stuck undersized for 8h, current state
active+undersized+degraded+remapped+backfill_wait, last acting [105,7]
pg 26.59 is stuck undersized for 23h, current state
active+undersized+degraded+remapped+backfill_wait, last acting [35,40]
pg 26.6a is stuck undersized for 7h, current state
active+undersized+degraded+remapped+backfill_wait, last acting [25,45]
pg 26.74 is stuck undersized for 23h, current state
active+undersized+degraded+remapped+backfill_wait, last acting
[112,131]
pg 26.77 is stuck undersized for 2h, current state
active+undersized+degraded+remapped+backfill_wait, last acting [17,105]
pg 26.9c is stuck undersized for 19h, current state
active+undersized+degraded+remapped+backfill_wait, last acting [47,76]
pg 26.bf is stuck undersized for 4m, current state
active+undersized+degraded+remapped+backfill_wait, last acting [32,26]
pg 26.c2 is stuck undersized for 2h, current state
active+undersized+degraded+remapped+backfill_wait, last acting [97,135]
pg 26.ec is stuck undersized for 4m, current state
active+undersized+degraded+remapped+backfill_wait, last acting [26,80]
pg 26.f7 is stuck undersized for 4m, current state
active+undersized+degraded+remapped+backfill_wait, last acting [49,105]
pg 31.12 is stuck undersized for 4m, current state
active+undersized+degraded+remapped+backfill_wait, last acting
[47,110,131,37,NONE]
pg 31.19 is stuck undersized for 2h, current state
active+undersized+degraded+remapped+backfill_wait, last acting
[NONE,87,3,137,39]
pg 31.1b is stuck undersized for 2h, current state
active+undersized+degraded+remapped+backfill_wait, last acting
[108,NONE,42,102,97]
pg 31.1c is stuck undersized for 19h, current state
active+undersized+degraded+remapped+backfill_wait, last acting
[113,66,NONE,45,72]
pg 31.41 is stuck undersized for 2h, current state
active+undersized+remapped+backfill_wait, last acting
[14,101,NONE,67,18]
pg 31.42 is stuck undersized for 7h, current state
active+undersized+degraded+remapped+backfill_wait, last acting
[37,134,NONE,82,8]
pg 31.44 is stuck undersized for 4m, current state
active+undersized+degraded+remapped+backfill_wait, last acting
[49,NONE,101,3,62]
pg 31.46 is stuck undersized for 4m, current state
active+undersized+degraded+remapped+backfilling, last acting
[95,38,NONE,25,102]
pg 31.47 is stuck undersized for 18h, current state
active+undersized+degraded+remapped+backfill_wait, last acting
[60,NONE,130,72,110]
pg 31.4a is stuck undersized for 4h, current state
active+undersized+degraded+remapped+backfilling, last acting
[66,NONE,135,82,14]
pg 31.4c is stuck undersized for 4m, current state
active+undersized+degraded+remapped+backfill_wait, last acting
[34,NONE,1,82,18]
pg 31.4d is stuck undersized for 2h, current state
active+undersized+degraded+remapped+backfill_wait, last acting
[101,13,65,30,NONE]
pg 31.52 is stuck undersized for 4m, current state
active+undersized+remapped+backfill_wait, last acting
[26,112,66,NONE,135]
pg 31.53 is stuck undersized for 17h, current state
active+undersized+degraded+remapped+backfill_wait, last acting
[NONE,24,11,42,110]
pg 31.55 is stuck undersized for 4m, current state
active+undersized+degraded+remapped+backfill_wait, last acting
[116,NONE,27,4,117]
pg 31.57 is stuck undersized for 4m, current state
active+undersized+degraded+remapped+backfill_wait, last acting
[17,15,110,NONE,1]
pg 31.5a is stuck undersized for 19h, current state
active+undersized+degraded+remapped+backfill_wait, last acting
[13,24,124,26,NONE]
pg 31.5b is stuck undersized for 4m, current state
active+undersized+degraded+remapped+backfill_wait, last acting
[NONE,31,8,27,42]
pg 31.5c is stuck undersized for 2h, current state
active+undersized+degraded+remapped+backfill_wait, last acting
[NONE,60,28,18,119]
pg 31.5d is stuck undersized for 17h, current state
active+undersized+degraded+remapped+backfill_wait, last acting
[124,NONE,85,6,11]
pg 31.5e is stuck undersized for 2h, current state
active+undersized+degraded+remapped+backfill_wait, last acting
[NONE,4,1,23,138]
pg 31.60 is stuck undersized for 18h, current state
active+undersized+degraded+remapped+backfilling, last acting
[37,11,102,NONE,133]
pg 31.64 is stuck undersized for 4m, current state
active+undersized+remapped+backfill_wait, last acting
[26,106,45,34,NONE]
pg 31.65 is stuck undersized for 4h, current state
active+undersized+degraded+remapped+backfill_wait, last acting
[NONE,14,127,62,3]
pg 31.66 is stuck undersized for 4m, current state
active+undersized+degraded+remapped+backfill_wait, last acting
[34,NONE,35,23,59]
pg 31.67 is stuck undersized for 4m, current state
active+undersized+degraded+remapped+backfill_wait, last acting
[67,24,127,8,NONE]
pg 31.68 is stuck undersized for 4m, current state
active+undersized+degraded+remapped+backfill_wait, last acting
[32,41,18,17,NONE]
pg 31.70 is stuck undersized for 19h, current state
active+undersized+degraded+remapped+backfilling, last acting
[106,38,1,NONE,97]
pg 31.75 is stuck undersized for 2h, current state
active+undersized+degraded+remapped+backfill_wait, last acting
[26,31,34,8,NONE]
pg 31.7f is stuck undersized for 2h, current state
active+undersized+remapped+backfilling, last acting
[NONE,101,110,10,45]
[WRN] PG_NOT_DEEP_SCRUBBED: 127 pgs not deep-scrubbed in time
pg 26.e5 not deep-scrubbed since 2025-06-07T05:59:22.612553+0000
pg 26.ca not deep-scrubbed since 2025-06-06T18:21:07.435823+0000
pg 26.bf not deep-scrubbed since 2025-06-03T07:29:02.791095+0000
pg 26.be not deep-scrubbed since 2025-06-04T07:55:54.389714+0000
pg 26.a5 not deep-scrubbed since 2025-06-07T07:28:56.878096+0000
pg 26.85 not deep-scrubbed since 2025-06-03T15:18:59.395929+0000
pg 26.7f not deep-scrubbed since 2025-06-04T21:29:28.412637+0000
pg 26.7e not deep-scrubbed since 2025-06-03T09:14:19.585388+0000
pg 26.7d not deep-scrubbed since 2025-06-03T18:37:30.931020+0000
pg 26.7c not deep-scrubbed since 2025-06-04T13:38:00.061488+0000
pg 26.73 not deep-scrubbed since 2025-06-03T06:20:15.111819+0000
pg 26.6f not deep-scrubbed since 2025-06-03T13:45:24.880397+0000
pg 26.6e not deep-scrubbed since 2025-05-26T23:15:32.099862+0000
pg 26.6d not deep-scrubbed since 2025-06-04T14:04:10.449101+0000
pg 31.62 not deep-scrubbed since 2025-06-03T13:34:49.518456+0000
pg 26.65 not deep-scrubbed since 2025-06-04T07:56:25.353411+0000
pg 31.66 not deep-scrubbed since 2025-06-03T10:32:05.364424+0000
pg 26.62 not deep-scrubbed since 2025-06-04T09:35:58.267976+0000
pg 31.65 not deep-scrubbed since 2025-06-03T16:04:40.003140+0000
pg 31.5b not deep-scrubbed since 2025-06-03T14:18:18.835477+0000
pg 26.5d not deep-scrubbed since 2025-06-04T15:14:30.870252+0000
pg 31.58 not deep-scrubbed since 2025-06-03T03:09:27.568605+0000
pg 26.5c not deep-scrubbed since 2025-06-03T01:57:27.644129+0000
pg 31.5f not deep-scrubbed since 2025-06-03T05:53:20.860393+0000
pg 31.52 not deep-scrubbed since 2025-05-27T00:01:27.040861+0000
pg 31.53 not deep-scrubbed since 2025-05-24T09:37:58.964829+0000
pg 26.55 not deep-scrubbed since 2025-06-04T21:25:34.135356+0000
pg 26.54 not deep-scrubbed since 2025-06-04T06:07:12.978734+0000
pg 31.56 not deep-scrubbed since 2025-06-04T12:58:17.599712+0000
pg 31.57 not deep-scrubbed since 2025-06-03T07:02:16.859990+0000
pg 26.51 not deep-scrubbed since 2025-06-03T05:42:22.435483+0000
pg 26.4f not deep-scrubbed since 2025-06-03T09:10:22.617328+0000
pg 31.4a not deep-scrubbed since 2025-05-28T00:54:55.246532+0000
pg 26.4e not deep-scrubbed since 2025-06-03T11:16:49.278513+0000
pg 31.4b not deep-scrubbed since 2025-06-03T10:24:24.123351+0000
pg 26.4d not deep-scrubbed since 2025-06-04T19:01:44.614410+0000
pg 31.49 not deep-scrubbed since 2025-05-28T04:56:29.368285+0000
pg 31.42 not deep-scrubbed since 2025-05-28T08:38:57.151865+0000
pg 26.41 not deep-scrubbed since 2025-06-03T05:47:35.443867+0000
pg 26.40 not deep-scrubbed since 2025-06-03T05:43:13.283668+0000
pg 25.17 not deep-scrubbed since 2025-06-03T09:26:52.253625+0000
pg 32.2e not deep-scrubbed since 2025-06-05T13:01:06.175389+0000
pg 22.1a not deep-scrubbed since 2025-06-07T01:40:45.063268+0000
pg 31.13 not deep-scrubbed since 2025-06-03T12:21:17.965218+0000
pg 22.1b not deep-scrubbed since 2025-06-04T12:22:44.947751+0000
pg 25.14 not deep-scrubbed since 2025-05-28T06:26:32.552200+0000
pg 26.17 not deep-scrubbed since 2025-06-03T18:37:26.617483+0000
pg 31.12 not deep-scrubbed since 2025-05-28T08:39:23.271194+0000
pg 31.1c not deep-scrubbed since 2025-06-03T08:17:51.230187+0000
pg 25.1f not deep-scrubbed since 2025-05-28T00:19:23.653883+0000
77 more pgs…
Regards
Dev
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io