Or depending on the release in force when the OSDs were created, perhaps shard 
RocksDB column families?



https://www.ibm.com/docs/en/storage-ceph/8.0.0?topic=bluestore-resharding-rocksdb-database
(Playbook from cephadm-ansible)

> On Jun 20, 2025, at 1:57 AM, Michel Jouvin <michel.jou...@ijclab.in2p3.fr> 
> wrote:
> 
> Hi Dev,
> 
> Not sure to understand why there was these service deployment time-out, the 
> log says that one OSD failed, this may explain that that the upgrade is not 
> progressing anymore. The Bluestore slow ops (a new warning so not necessarily 
> something new) on so many OSD seem to suggest that there something not 
> optimal. As suggested in another thread recently it may be an indication that 
> you need to compact OSD.
> 
> I am not sure what you adjusted but as long as the cluster works, I would not 
> have changed parameters and try to fix the mentioned problems.
> 
> Good luck,
> 
> Michel
> Sent from my mobile
> Le 20 juin 2025 05:13:38 Devender Singh <deven...@netskrt.io> a écrit :
> 
>> Here is the status
>> 
>> # ceph orch upgrade status
>> {
>>   "target_image":    
>> "quay.io/ceph/ceph@sha256:8214ebff6133ac27d20659038df6962dbf9d77da21c9438a296b2e2059a56af6",
>>   "in_progress": true,
>>   "which": "Upgrading all daemon types on all hosts",
>>   "services_complete": [
>>       "crash",
>>       "mgr",
>>       "mon"
>>   ],
>>   "progress": "74/113 daemons upgraded",
>>   "message": "",
>>   "is_paused": false
>> }
>> 
>> Regards
>> Dev
>> 
>>> On Jun 19, 2025, at 8:06 PM, Devender Singh <deven...@netskrt.io> wrote:
>>> 
>>> 
>>> Hello all
>>> 
>>> I have a cluster where my cluster is in hung state, Some back fills are 
>>> there but I reduced it to 1 but still upgrade not progressing…
>>> Please help…
>>> 
>>> ```# ceph health detail
>>> HEALTH_WARN 8 OSD(s) experiencing slow operations in BlueStore; Failed to 
>>> apply 2 service(s): osd.all-available-devices,osd.iops_optimized; 1 failed 
>>> cephadm daemon(s); failed to probe daemons or devices; noscrub,nodeep-scrub 
>>> flag(s) set; Degraded data redundancy: 1600150/39365198 objects degraded 
>>> (4.065%), 93 pgs degraded, 103 pgs undersized; 127 pgs not deep-scrubbed in 
>>> time
>>> [WRN] BLUESTORE_SLOW_OP_ALERT: 8 OSD(s) experiencing slow operations in 
>>> BlueStore
>>> osd.5 observed slow operation indications in BlueStore
>>> osd.9 observed slow operation indications in BlueStore
>>> osd.18 observed slow operation indications in BlueStore
>>> osd.36 observed slow operation indications in BlueStore
>>> osd.59 observed slow operation indications in BlueStore
>>> osd.66 observed slow operation indications in BlueStore
>>> osd.106 observed slow operation indications in BlueStore
>>> osd.110 observed slow operation indications in BlueStore
>>> [WRN] CEPHADM_APPLY_SPEC_FAIL: Failed to apply 2 service(s): 
>>> osd.all-available-devices,osd.iops_optimized
>>> osd.all-available-devices: Command timed out on host cephadm deploy (osd 
>>> daemon) (default 900 second timeout)
>>> osd.iops_optimized: Command timed out on host cephadm deploy (osd daemon) 
>>> (default 900 second timeout)
>>> [WRN] CEPHADM_FAILED_DAEMON: 1 failed cephadm daemon(s)
>>> daemon osd.19 on phl-prod-host04n.example.comis in error state
>>> [WRN] CEPHADM_REFRESH_FAILED: failed to probe daemons or devices
>>> Command "cephadm ceph-volume -- inventory" timed out on host 
>>> phl-prod-converged03n.phl.netskrt.org (default 900 second timeout)
>>> [WRN] OSDMAP_FLAGS: noscrub,nodeep-scrub flag(s) set
>>> [WRN] PG_DEGRADED: Degraded data redundancy: 1600150/39365198 objects 
>>> degraded (4.065%), 93 pgs degraded, 103 pgs undersized
>>> pg 25.0 is stuck undersized for 21h, current state 
>>> active+undersized+degraded+remapped+backfill_wait, last acting 
>>> [90,77,NONE,38,3]
>>> pg 25.1 is stuck undersized for 23h, current state 
>>> active+undersized+degraded+remapped+backfill_wait, last acting 
>>> [131,1,66,NONE,72]
>>> pg 25.c is stuck undersized for 23h, current state 
>>> active+undersized+degraded+remapped+backfill_wait, last acting 
>>> [1,135,110,28,NONE]
>>> pg 25.e is active+undersized+degraded+remapped+backfill_wait, acting 
>>> [18,20,108,101,NONE]
>>> pg 25.14 is stuck undersized for 19h, current state 
>>> active+undersized+degraded+remapped+backfill_wait, last acting 
>>> [32,13,NONE,65,97]
>>> pg 25.17 is stuck undersized for 19h, current state 
>>> active+undersized+degraded+remapped+backfill_wait, last acting 
>>> [32,65,110,23,NONE]
>>> pg 25.1a is stuck undersized for 23h, current state 
>>> active+undersized+degraded+remapped+backfill_wait, last acting 
>>> [97,66,NONE,112,139]
>>> pg 25.1c is stuck undersized for 2h, current state 
>>> active+undersized+degraded+remapped+backfill_wait, last acting 
>>> [102,136,97,NONE,45]
>>> pg 25.1f is stuck undersized for 8h, current state 
>>> active+undersized+degraded+remapped+backfill_wait, last acting 
>>> [NONE,66,7,39,109]
>>> pg 26.45 is stuck undersized for 23h, current state 
>>> active+undersized+degraded+remapped+backfill_wait, last acting [32,47]
>>> pg 26.48 is stuck undersized for 23h, current state 
>>> active+undersized+degraded+remapped+backfill_wait, last acting [113,55]
>>> pg 26.4b is stuck undersized for 8h, current state 
>>> active+undersized+degraded+remapped+backfill_wait, last acting [105,7]
>>> pg 26.59 is stuck undersized for 23h, current state 
>>> active+undersized+degraded+remapped+backfill_wait, last acting [35,40]
>>> pg 26.6a is stuck undersized for 7h, current state 
>>> active+undersized+degraded+remapped+backfill_wait, last acting [25,45]
>>> pg 26.74 is stuck undersized for 23h, current state 
>>> active+undersized+degraded+remapped+backfill_wait, last acting [112,131]
>>> pg 26.77 is stuck undersized for 2h, current state 
>>> active+undersized+degraded+remapped+backfill_wait, last acting [17,105]
>>> pg 26.9c is stuck undersized for 19h, current state 
>>> active+undersized+degraded+remapped+backfill_wait, last acting [47,76]
>>> pg 26.bf is stuck undersized for 4m, current state 
>>> active+undersized+degraded+remapped+backfill_wait, last acting [32,26]
>>> pg 26.c2 is stuck undersized for 2h, current state 
>>> active+undersized+degraded+remapped+backfill_wait, last acting [97,135]
>>> pg 26.ec is stuck undersized for 4m, current state 
>>> active+undersized+degraded+remapped+backfill_wait, last acting [26,80]
>>> pg 26.f7 is stuck undersized for 4m, current state 
>>> active+undersized+degraded+remapped+backfill_wait, last acting [49,105]
>>> pg 31.12 is stuck undersized for 4m, current state 
>>> active+undersized+degraded+remapped+backfill_wait, last acting 
>>> [47,110,131,37,NONE]
>>> pg 31.19 is stuck undersized for 2h, current state 
>>> active+undersized+degraded+remapped+backfill_wait, last acting 
>>> [NONE,87,3,137,39]
>>> pg 31.1b is stuck undersized for 2h, current state 
>>> active+undersized+degraded+remapped+backfill_wait, last acting 
>>> [108,NONE,42,102,97]
>>> pg 31.1c is stuck undersized for 19h, current state 
>>> active+undersized+degraded+remapped+backfill_wait, last acting 
>>> [113,66,NONE,45,72]
>>> pg 31.41 is stuck undersized for 2h, current state 
>>> active+undersized+remapped+backfill_wait, last acting [14,101,NONE,67,18]
>>> pg 31.42 is stuck undersized for 7h, current state 
>>> active+undersized+degraded+remapped+backfill_wait, last acting 
>>> [37,134,NONE,82,8]
>>> pg 31.44 is stuck undersized for 4m, current state 
>>> active+undersized+degraded+remapped+backfill_wait, last acting 
>>> [49,NONE,101,3,62]
>>> pg 31.46 is stuck undersized for 4m, current state 
>>> active+undersized+degraded+remapped+backfilling, last acting 
>>> [95,38,NONE,25,102]
>>> pg 31.47 is stuck undersized for 18h, current state 
>>> active+undersized+degraded+remapped+backfill_wait, last acting 
>>> [60,NONE,130,72,110]
>>> pg 31.4a is stuck undersized for 4h, current state 
>>> active+undersized+degraded+remapped+backfilling, last acting 
>>> [66,NONE,135,82,14]
>>> pg 31.4c is stuck undersized for 4m, current state 
>>> active+undersized+degraded+remapped+backfill_wait, last acting 
>>> [34,NONE,1,82,18]
>>> pg 31.4d is stuck undersized for 2h, current state 
>>> active+undersized+degraded+remapped+backfill_wait, last acting 
>>> [101,13,65,30,NONE]
>>> pg 31.52 is stuck undersized for 4m, current state 
>>> active+undersized+remapped+backfill_wait, last acting [26,112,66,NONE,135]
>>> pg 31.53 is stuck undersized for 17h, current state 
>>> active+undersized+degraded+remapped+backfill_wait, last acting 
>>> [NONE,24,11,42,110]
>>> pg 31.55 is stuck undersized for 4m, current state 
>>> active+undersized+degraded+remapped+backfill_wait, last acting 
>>> [116,NONE,27,4,117]
>>> pg 31.57 is stuck undersized for 4m, current state 
>>> active+undersized+degraded+remapped+backfill_wait, last acting 
>>> [17,15,110,NONE,1]
>>> pg 31.5a is stuck undersized for 19h, current state 
>>> active+undersized+degraded+remapped+backfill_wait, last acting 
>>> [13,24,124,26,NONE]
>>> pg 31.5b is stuck undersized for 4m, current state 
>>> active+undersized+degraded+remapped+backfill_wait, last acting 
>>> [NONE,31,8,27,42]
>>> pg 31.5c is stuck undersized for 2h, current state 
>>> active+undersized+degraded+remapped+backfill_wait, last acting 
>>> [NONE,60,28,18,119]
>>> pg 31.5d is stuck undersized for 17h, current state 
>>> active+undersized+degraded+remapped+backfill_wait, last acting 
>>> [124,NONE,85,6,11]
>>> pg 31.5e is stuck undersized for 2h, current state 
>>> active+undersized+degraded+remapped+backfill_wait, last acting 
>>> [NONE,4,1,23,138]
>>> pg 31.60 is stuck undersized for 18h, current state 
>>> active+undersized+degraded+remapped+backfilling, last acting 
>>> [37,11,102,NONE,133]
>>> pg 31.64 is stuck undersized for 4m, current state 
>>> active+undersized+remapped+backfill_wait, last acting [26,106,45,34,NONE]
>>> pg 31.65 is stuck undersized for 4h, current state 
>>> active+undersized+degraded+remapped+backfill_wait, last acting 
>>> [NONE,14,127,62,3]
>>> pg 31.66 is stuck undersized for 4m, current state 
>>> active+undersized+degraded+remapped+backfill_wait, last acting 
>>> [34,NONE,35,23,59]
>>> pg 31.67 is stuck undersized for 4m, current state 
>>> active+undersized+degraded+remapped+backfill_wait, last acting 
>>> [67,24,127,8,NONE]
>>> pg 31.68 is stuck undersized for 4m, current state 
>>> active+undersized+degraded+remapped+backfill_wait, last acting 
>>> [32,41,18,17,NONE]
>>> pg 31.70 is stuck undersized for 19h, current state 
>>> active+undersized+degraded+remapped+backfilling, last acting 
>>> [106,38,1,NONE,97]
>>> pg 31.75 is stuck undersized for 2h, current state 
>>> active+undersized+degraded+remapped+backfill_wait, last acting 
>>> [26,31,34,8,NONE]
>>> pg 31.7f is stuck undersized for 2h, current state 
>>> active+undersized+remapped+backfilling, last acting [NONE,101,110,10,45]
>>> [WRN] PG_NOT_DEEP_SCRUBBED: 127 pgs not deep-scrubbed in time
>>> pg 26.e5 not deep-scrubbed since 2025-06-07T05:59:22.612553+0000
>>> pg 26.ca not deep-scrubbed since 2025-06-06T18:21:07.435823+0000
>>> pg 26.bf not deep-scrubbed since 2025-06-03T07:29:02.791095+0000
>>> pg 26.be not deep-scrubbed since 2025-06-04T07:55:54.389714+0000
>>> pg 26.a5 not deep-scrubbed since 2025-06-07T07:28:56.878096+0000
>>> pg 26.85 not deep-scrubbed since 2025-06-03T15:18:59.395929+0000
>>> pg 26.7f not deep-scrubbed since 2025-06-04T21:29:28.412637+0000
>>> pg 26.7e not deep-scrubbed since 2025-06-03T09:14:19.585388+0000
>>> pg 26.7d not deep-scrubbed since 2025-06-03T18:37:30.931020+0000
>>> pg 26.7c not deep-scrubbed since 2025-06-04T13:38:00.061488+0000
>>> pg 26.73 not deep-scrubbed since 2025-06-03T06:20:15.111819+0000
>>> pg 26.6f not deep-scrubbed since 2025-06-03T13:45:24.880397+0000
>>> pg 26.6e not deep-scrubbed since 2025-05-26T23:15:32.099862+0000
>>> pg 26.6d not deep-scrubbed since 2025-06-04T14:04:10.449101+0000
>>> pg 31.62 not deep-scrubbed since 2025-06-03T13:34:49.518456+0000
>>> pg 26.65 not deep-scrubbed since 2025-06-04T07:56:25.353411+0000
>>> pg 31.66 not deep-scrubbed since 2025-06-03T10:32:05.364424+0000
>>> pg 26.62 not deep-scrubbed since 2025-06-04T09:35:58.267976+0000
>>> pg 31.65 not deep-scrubbed since 2025-06-03T16:04:40.003140+0000
>>> pg 31.5b not deep-scrubbed since 2025-06-03T14:18:18.835477+0000
>>> pg 26.5d not deep-scrubbed since 2025-06-04T15:14:30.870252+0000
>>> pg 31.58 not deep-scrubbed since 2025-06-03T03:09:27.568605+0000
>>> pg 26.5c not deep-scrubbed since 2025-06-03T01:57:27.644129+0000
>>> pg 31.5f not deep-scrubbed since 2025-06-03T05:53:20.860393+0000
>>> pg 31.52 not deep-scrubbed since 2025-05-27T00:01:27.040861+0000
>>> pg 31.53 not deep-scrubbed since 2025-05-24T09:37:58.964829+0000
>>> pg 26.55 not deep-scrubbed since 2025-06-04T21:25:34.135356+0000
>>> pg 26.54 not deep-scrubbed since 2025-06-04T06:07:12.978734+0000
>>> pg 31.56 not deep-scrubbed since 2025-06-04T12:58:17.599712+0000
>>> pg 31.57 not deep-scrubbed since 2025-06-03T07:02:16.859990+0000
>>> pg 26.51 not deep-scrubbed since 2025-06-03T05:42:22.435483+0000
>>> pg 26.4f not deep-scrubbed since 2025-06-03T09:10:22.617328+0000
>>> pg 31.4a not deep-scrubbed since 2025-05-28T00:54:55.246532+0000
>>> pg 26.4e not deep-scrubbed since 2025-06-03T11:16:49.278513+0000
>>> pg 31.4b not deep-scrubbed since 2025-06-03T10:24:24.123351+0000
>>> pg 26.4d not deep-scrubbed since 2025-06-04T19:01:44.614410+0000
>>> pg 31.49 not deep-scrubbed since 2025-05-28T04:56:29.368285+0000
>>> pg 31.42 not deep-scrubbed since 2025-05-28T08:38:57.151865+0000
>>> pg 26.41 not deep-scrubbed since 2025-06-03T05:47:35.443867+0000
>>> pg 26.40 not deep-scrubbed since 2025-06-03T05:43:13.283668+0000
>>> pg 25.17 not deep-scrubbed since 2025-06-03T09:26:52.253625+0000
>>> pg 32.2e not deep-scrubbed since 2025-06-05T13:01:06.175389+0000
>>> pg 22.1a not deep-scrubbed since 2025-06-07T01:40:45.063268+0000
>>> pg 31.13 not deep-scrubbed since 2025-06-03T12:21:17.965218+0000
>>> pg 22.1b not deep-scrubbed since 2025-06-04T12:22:44.947751+0000
>>> pg 25.14 not deep-scrubbed since 2025-05-28T06:26:32.552200+0000
>>> pg 26.17 not deep-scrubbed since 2025-06-03T18:37:26.617483+0000
>>> pg 31.12 not deep-scrubbed since 2025-05-28T08:39:23.271194+0000
>>> pg 31.1c not deep-scrubbed since 2025-06-03T08:17:51.230187+0000
>>> pg 25.1f not deep-scrubbed since 2025-05-28T00:19:23.653883+0000
>>> 77 more pgs…
>>> 
>>> Regards
>>> Dev
>> 
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
> 
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to