[ceph-users] Re: CEPH upgrade from 18.2.7 to 19.2.2 -- Hung from last 24h at 66%

Anthony Fecarotta Sun, 22 Jun 2025 02:51:11 -0700

Hello, maybe I am reading or understanding this incorrectly, but why would 
jumbo frames negatively impact Ceph performance.



Regards,
[image]
Anthony Fecarotta
Founder & President
[image] anth...@linehaul.ai <mailto:anth...@linehaul.ai>
[image] 224-339-1182 [image] (855) 625-0300
[image] 1 Mid America Plz Flr 3 Oakbrook Terrace, IL 60181
[image] www.linehaul.ai <http://www.linehaul.ai/>
[image] <http://www.linehaul.ai/>
[image] <https://www.linkedin.com/in/anthony-fec/>

On Sun Jun 22, 2025, 08:50 AM GMT, Michel Jouvin 
<mailto:michel.jou...@ijclab.in2p3.fr> wrote:
> Hi Dev,
>
> I am not sure why you formatted osd.19, may be I missed something. Clearly
> Ceph is very sensitive to network issues on any of the network used. Your
> priority should be to ensure that your network config is ok between you
> Ceph servers. This is an OS configuration issue that you need to
> troubleshoot with the usual OS tools. The problem may be a change in your
> network infrastructure (switches config for example), a problem with MTU
> size if you're using Jumbo frames...
>
> If your cluster was kind of ok before the upgrade, for me there is no
> reason to reformat OSDs or change anything to the cluster config. You need
> to spot what the problem cause is, may be something outside Ceph, and fix
> it before trying to restart the upgrade.
>
> Good luck.
>
> Michel
> Sent from my mobile
> Le 21 juin 2025 23:50:01 Devender Singh <deven...@netskrt.io> a écrit :
>> Hello Fred
>>
>> I formatted the osd.19 but facing similar issue on osd.9. I have pause the
>> upgrade.
>>
>> Below are the logs, another issue found is my osd’s are not using cluster
>> network…. How to deal with it?
>>
>> root@pl-host04n:/var/lib/ceph/a0bd51e8-4dfc-11ee-b5a9-3b06e501a0ac/osd.9#
>> ceph config get mon public_network
>> 10.104.1.0/24
>> root@pl-host04n:/var/lib/ceph/a0bd51e8-4dfc-11ee-b5a9-3b06e501a0ac/osd.9#
>> ceph config get mon cluster_network
>> 10.104.5.0/24
>> root@pl-host04n:/var/lib/ceph/a0bd51e8-4dfc-11ee-b5a9-3b06e501a0ac/osd.9#
>> ceph osd find 9
>> {
>> "osd": 9,
>> "addrs": {
>> "addrvec": [
>> {
>> "type": "v2",
>> "addr": "10.104.1.124:6802",
>> "nonce": 2257868117
>> },
>> {
>> "type": "v1",
>> "addr": "10.104.1.124:6803",
>> "nonce": 2257868117
>> }
>> ]
>> },
>> "osd_fsid": "4db6e332-9031-4c81-8de0-00fdd6b860f6",
>> "host": "pl-host04n.phl.example.com",
>> "crush_location": {
>> "host": "pl-host04n",
>> "root": "default"
>> }
>> }
>> root@pl-host04n:/var/lib/ceph/a0bd51e8-4dfc-11ee-b5a9-3b06e501a0ac/osd.9#
>> grep -E '\bosd\.9\b' /var/log/syslog-ceph |tail -20
>> Jun 21 20:10:00 pl-host04n bash[3965687]: debug
>> 2025-06-21T20:09:59.996+0000 7fd05b6b4640 0 log_channel(cluster) log [WRN]
>> : daemon osd.9 on pl-host04n.phl.example.com is in error state
>> Jun 21 20:10:00 pl-host04n bash[3965687]: cluster
>> 2025-06-21T20:10:00.000344+0000 mon.pl-host04n (mon.0) 208269 : cluster
>> [WRN] daemon osd.9 on pl-host04n.phl.example.com is in error state
>> Jun 21 20:20:00 pl-host04n bash[3965687]: debug
>> 2025-06-21T20:19:59.994+0000 7fd05b6b4640 0 log_channel(cluster) log [WRN]
>> : daemon osd.9 on pl-host04n.phl.example.com is in error state
>> Jun 21 20:20:00 pl-host04n bash[3965687]: cluster
>> 2025-06-21T20:20:00.000282+0000 mon.pl-host04n (mon.0) 208543 : cluster
>> [WRN] daemon osd.9 on pl-host04n.phl.example.com is in error state
>> Jun 21 20:30:00 pl-host04n bash[3965687]: debug
>> 2025-06-21T20:29:59.993+0000 7fd05b6b4640 0 log_channel(cluster) log [WRN]
>> : daemon osd.9 on pl-host04n.phl.example.com is in error state
>> Jun 21 20:30:00 pl-host04n bash[3965687]: cluster
>> 2025-06-21T20:30:00.000292+0000 mon.pl-host04n (mon.0) 208802 : cluster
>> [WRN] daemon osd.9 on pl-host04n.phl.example.com is in error state
>> Jun 21 20:40:00 pl-host04n bash[3965687]: debug
>> 2025-06-21T20:39:59.996+0000 7fd05b6b4640 0 log_channel(cluster) log [WRN]
>> : daemon osd.9 on pl-host04n.phl.example.com is in error state
>> Jun 21 20:40:00 pl-host04n bash[3965687]: cluster
>> 2025-06-21T20:40:00.000322+0000 mon.pl-host04n (mon.0) 209104 : cluster
>> [WRN] daemon osd.9 on pl-host04n.phl.example.com is in error state
>> Jun 21 20:50:00 pl-host04n bash[3965687]: debug
>> 2025-06-21T20:49:59.994+0000 7fd05b6b4640 0 log_channel(cluster) log [WRN]
>> : daemon osd.9 on pl-host04n.phl.example.com is in error state
>> Jun 21 20:50:00 pl-host04n bash[3965687]: cluster
>> 2025-06-21T20:50:00.000342+0000 mon.pl-host04n (mon.0) 209394 : cluster
>> [WRN] daemon osd.9 on pl-host04n.phl.example.com is in error state
>> Jun 21 21:00:00 pl-host04n bash[3965687]: debug
>> 2025-06-21T20:59:59.993+0000 7fd05b6b4640 0 log_channel(cluster) log [WRN]
>> : daemon osd.9 on pl-host04n.phl.example.com is in error state
>> Jun 21 21:00:00 pl-host04n bash[3965687]: cluster
>> 2025-06-21T21:00:00.000331+0000 mon.pl-host04n (mon.0) 209681 : cluster
>> [WRN] daemon osd.9 on pl-host04n.phl.example.com is in error state
>> Jun 21 21:10:00 pl-host04n bash[3965687]: debug
>> 2025-06-21T21:09:59.996+0000 7fd05b6b4640 0 log_channel(cluster) log [WRN]
>> : daemon osd.9 on pl-host04n.phl.example.com is in error state
>> Jun 21 21:10:00 pl-host04n bash[3965687]: cluster
>> 2025-06-21T21:10:00.000308+0000 mon.pl-host04n (mon.0) 209967 : cluster
>> [WRN] daemon osd.9 on pl-host04n.phl.example.com is in error state
>> Jun 21 21:20:00 pl-host04n bash[3965687]: debug
>> 2025-06-21T21:19:59.994+0000 7fd05b6b4640 0 log_channel(cluster) log [WRN]
>> : daemon osd.9 on pl-host04n.phl.example.com is in error state
>> Jun 21 21:20:00 pl-host04n bash[3965687]: cluster
>> 2025-06-21T21:20:00.000268+0000 mon.pl-host04n (mon.0) 210298 : cluster
>> [WRN] daemon osd.9 on pl-host04n.phl.example.com is in error state
>> Jun 21 21:30:00 pl-host04n bash[3965687]: debug
>> 2025-06-21T21:29:59.993+0000 7fd05b6b4640 0 log_channel(cluster) log [WRN]
>> : daemon osd.9 on pl-host04n.phl.example.com is in error state
>> Jun 21 21:30:00 pl-host04n bash[3965687]: cluster
>> 2025-06-21T21:30:00.000316+0000 mon.pl-host04n (mon.0) 210560 : cluster
>> [WRN] daemon osd.9 on pl-host04n.phl.example.com is in error state
>> Jun 21 21:40:00 pl-host04n bash[3965687]: debug
>> 2025-06-21T21:39:59.996+0000 7fd05b6b4640 0 log_channel(cluster) log [WRN]
>> : daemon osd.9 on pl-host04n.phl.example.com is in error state
>> Jun 21 21:40:00 pl-host04n bash[3965687]: cluster
>> 2025-06-21T21:40:00.000231+0000 mon.pl-host04n (mon.0) 210817 : cluster
>> [WRN] daemon osd.9 on pl-host04n.phl.example.com is in error state
>>
>>
>> root@pl-host04n:/var/lib/ceph/a0bd51e8-4dfc-11ee-b5a9-3b06e501a0ac/osd.9#
>> systemctl list-units |grep -i osd.9
>> ● ceph-a0bd51e8-4dfc-11ee-b5a9-3b06e501a0ac@osd.9.service
>> loaded failed
>> failed Ceph osd.9 for a0bd51e8-4dfc-11ee-b5a9-3b06e501a0ac
>> ● ceph-osd@9.service
>> loaded failed
>> failed Ceph object storage daemon osd.9
>> root@pl-host04n:/var/lib/ceph/a0bd51e8-4dfc-11ee-b5a9-3b06e501a0ac/osd.9#
>> systemctl status ceph-a0bd51e8-4dfc-11ee-b5a9-3b06e501a0ac@osd.9.service
>> × ceph-a0bd51e8-4dfc-11ee-b5a9-3b06e501a0ac@osd.9.service - Ceph osd.9 for
>> a0bd51e8-4dfc-11ee-b5a9-3b06e501a0ac
>> Loaded: loaded
>> (/etc/systemd/system/ceph-a0bd51e8-4dfc-11ee-b5a9-3b06e501a0ac@.service;
>> enabled; vendor preset: enabled)
>> Active: failed (Result: exit-code) since Sat 2025-06-21 18:46:01 UTC; 3h
>> 2min ago
>> Process: 3156903 ExecStart=/bin/bash
>> /var/lib/ceph/a0bd51e8-4dfc-11ee-b5a9-3b06e501a0ac/osd.9/unit.run
>> (code=exited, status=1/FAI>
>> Process: 3158431 ExecStopPost=/bin/bash
>> /var/lib/ceph/a0bd51e8-4dfc-11ee-b5a9-3b06e501a0ac/osd.9/unit.poststop
>> (code=exited, stat>
>> Main PID: 3156903 (code=exited, status=1/FAILURE)
>> CPU: 571ms
>>
>> Jun 21 18:46:01 pl-host04n.phl.example.com systemd[1]:
>> ceph-a0bd51e8-4dfc-11ee-b5a9-3b06e501a0ac@osd.9.service: Scheduled >
>> Jun 21 18:46:01 pl-host04n.phl.example.com systemd[1]: Stopped Ceph osd.9
>> for a0bd51e8-4dfc-11ee-b5a9-3b06e501a0ac.
>> Jun 21 18:46:01 pl-host04n.phl.example.com systemd[1]:
>> ceph-a0bd51e8-4dfc-11ee-b5a9-3b06e501a0ac@osd.9.service: Start requ>
>> Jun 21 18:46:01 pl-host04n.phl.example.com systemd[1]:
>> ceph-a0bd51e8-4dfc-11ee-b5a9-3b06e501a0ac@osd.9.service: Failed wit>
>> Jun 21 18:46:01 pl-host04n.phl.example.com systemd[1]: Failed to start Ceph
>> osd.9 for a0bd51e8-4dfc-11ee-b5a9-3b06e501a0ac.
>> root@pl-host04n:/var/lib/ceph/a0bd51e8-4dfc-11ee-b5a9-3b06e501a0ac/osd.9#
>> journalctl -u ceph-a0bd51e8-4dfc-11ee-b5a9-3b06e501a0ac@osd.9.service
>> Jun 17 04:44:20 pl-host04n.phl.example.com bash[2152282]: debug
>> 2025-06-17T04:44:20.588+0000 7fded3075640 -1 osd.9 pg_epoc>
>> Jun 17 04:44:24 pl-host04n.phl.example.com bash[2152282]: debug
>> 2025-06-17T04:44:24.076+0000 7fdede08b640 4 rocksdb: [db/>
>> Jun 17 04:45:38 pl-host04n.phl.example.com bash[2152282]: debug
>> 2025-06-17T04:45:38.812+0000 7fdede08b640 4 rocksdb: [db/>
>> Jun 17 04:45:51 pl-host04n.phl.example.com bash[2152282]: debug
>> 2025-06-17T04:45:51.776+0000 7fdedd089640 4 rocksdb: [db/>
>> Jun 17 04:45:51 pl-host04n.phl.example.com bash[2152282]: debug
>> 2025-06-17T04:45:51.776+0000 7fdedd089640 4 rocksdb: [db/>
>> Jun 17 04:45:51 pl-host04n.phl.example.com bash[2152282]: ** DB Stats **
>> Jun 17 04:45:51 pl-host04n.phl.example.com bash[2152282]: Uptime(secs):
>> 88200.5 total, 600.0 interval
>> Jun 17 04:45:51 pl-host04n.phl.example.com bash[2152282]: Cumulative
>> writes: 4721K writes, 20M keys, 4721K commit groups, >
>> Jun 17 04:45:51 pl-host04n.phl.example.com bash[2152282]: Cumulative WAL:
>> 4721K writes, 1745K syncs, 2.71 writes per sync,>
>> Jun 17 04:45:51 pl-host04n.phl.example.com bash[2152282]: Cumulative stall:
>> 00:00:0.000 H:M:S, 0.0 percent
>> Jun 17 04:45:51 pl-host04n.phl.example.com bash[2152282]: Interval writes:
>> 21K writes, 229K keys, 21K commit groups, 1.0 w>
>> Jun 17 04:45:51 pl-host04n.phl.example.com bash[2152282]: Interval WAL: 21K
>> writes, 7176 syncs, 2.98 writes per sync, writ>
>> Jun 17 04:45:51 pl-host04n.phl.example.com bash[2152282]: Interval stall:
>> 00:00:0.000 H:M:S, 0.0 percent
>> Jun 17 04:45:51 pl-host04n.phl.example.com bash[2152282]: ** Compaction
>> Stats [O-1] **
>> Jun 17 04:45:51 pl-host04n.phl.example.com bash[2152282]: Level Files
>> Size Score Read(GB) Rn(GB) Rnp1(GB) Write(>
>> Jun 17 04:45:51 pl-host04n.phl.example.com bash[2152282]:
>> ---------------------------------------------------------------->
>> Jun 17 04:45:51 pl-host04n.phl.example.com bash[2152282]: L0 0/0
>> 0.00 KB 0.0 0.0 0.0 0.0 0>
>> Jun 17 04:45:51 pl-host04n.phl.example.com bash[2152282]: L1 7/0
>> 393.07 MB 0.4 4.3 0.4 3.9 >
>> Jun 17 04:45:51 pl-host04n.phl.example.com bash[2152282]: Sum 7/0
>> 393.07 MB 0.0 4.3 0.4 3.9 >
>> Jun 17 04:45:51 pl-host04n.phl.example.com bash[2152282]: Int 0/0
>> 0.00 KB 0.0 0.0 0.0 0.0 0>
>> Jun 17 04:45:51 pl-host04n.phl.example.com bash[2152282]: ** Compaction
>> Stats [O-1] **
>> Jun 17 04:45:51 pl-host04n.phl.example.com bash[2152282]: Priority Files
>> Size Score Read(GB) Rn(GB) Rnp1(GB) Wri>
>> Jun 17 04:45:51 pl-host04n.phl.example.com bash[2152282]:
>> ---------------------------------------------------------------->
>> Jun 17 04:45:51 pl-host04n.phl.example.com bash[2152282]: Low 0/0
>> 0.00 KB 0.0 4.3 0.4 3.9 4>
>> Jun 17 04:45:51 pl-host04n.phl.example.com bash[2152282]: High 0/0
>> 0.00 KB 0.0 0.0 0.0 0.0 0>
>> Jun 17 04:45:51 pl-host04n.phl.example.com bash[2152282]: Blob file count:
>> 0, total size: 0.0 GB, garbage size: 0.0 GB, sp>
>> Jun 17 04:45:51 pl-host04n.phl.example.com bash[2152282]: Uptime(secs):
>> 88200.5 total, 4800.1 interval
>> Jun 17 04:45:51 pl-host04n.phl.example.com bash[2152282]: Flush(GB):
>> cumulative 0.407, interval 0.000
>> Jun 17 04:45:51 pl-host04n.phl.example.com bash[2152282]: AddFile(GB):
>> cumulative 0.000, interval 0.000
>> Jun 17 04:45:51 pl-host04n.phl.example.com bash[2152282]: AddFile(Total
>> Files): cumulative 0, interval 0
>> Jun 17 04:45:51 pl-host04n.phl.example.com bash[2152282]: AddFile(L0
>> Files): cumulative 0, interval 0
>> Jun 17 04:45:51 pl-host04n.phl.example.com bash[2152282]: AddFile(Keys):
>> cumulative 0, interval 0
>> Jun 17 04:45:51 pl-host04n.phl.example.com bash[2152282]: Cumulative
>> compaction: 4.64 GB write, 0.05 MB/s write, 4.26 GB r>
>>
>> Regards
>> Dev
>>
>>> On Jun 20, 2025, at 9:57 PM, Frédéric Nass <frederic.n...@univ-lorraine.fr>
>>> wrote:
>>>
>>> Hi Dev,
>>>
>>> Since MGRs and MONs were already upgraded successfully, you should be safe
>>> stopping the upgrade, and restart it at a later time.
>>>
>>> But before that, you could investigate why osd.19 is not coming up and why
>>> ceph-volume inventory times out. Can you ssh from MGR host to osd.19 'host
>>> phl-prod-host04n.example.comis'?
>>>
>>> I would look into ceph-osd.19.log and /var/log/messages for any hints on
>>> why osd.19 didn't start, start it manually and see if the upgrade resumes.
>>>
>>>
>>> If the upgrade doesn't resume, I would increase cephadm command timeout to
>>> 1800 (default is 900):
>>>
>>>
>>> $ ceph config set global mgr/cephadm/default_cephadm_command_timeout 1800
>>>
>>>
>>> (This might need a ceph mgr fail but ceph mgr fail will also interrupt the
>>> upgrade, IIRC.)
>>>
>>> then run
>>>
>>> $ ceph orch device ls --hostname=phl-prod-host04n.example.comis --refresh
>>>
>>> and see if the upgrade resumes. It if doesn't, check
>>>
>>> $ ceph log last 1000 debug cephadm
>>>
>>>
>>> and run
>>>
>>> $ ceph orch upgrade pause
>>> $ ceph orch upgrade resume
>>>
>>> again, see if the upgrade resumes. If it still doesn't, then
>>>
>>> $ ceph orch upgrade stop
>>> $ ceph mgr fail
>>> $ ceph orch upgrade start --image quay.io/ceph/ceph:v19.2.2
>>>
>>> All those commands should be safe to run in the current state of your 
>>> cluster.
>>>
>>> Regards,
>>> Frédéric.
>>>
>>> De : Devender Singh <deven...@netskrt.io>
>>> Envoyé : vendredi 20 juin 2025 23:35
>>> À : Anthony D'Atri
>>> Cc: Michel Jouvin; ceph-users
>>> Objet : [ceph-users] Re: CEPH upgrade from 18.2.7 to 19.2.2 -- Hung from
>>> last 24h at 66%
>>>
>>>
>>>
>>> Thanks all, what if I stop upgrade, what worst will happen ?
>>>
>>> Regards
>>> Dev
>>>> On Jun 20, 2025, at 6:41 AM, Anthony D'Atri <a...@dreamsnake.net> wrote:
>>>>
>>>> Or depending on the release in force when the OSDs were created, perhaps
>>>> shard RocksDB column families?
>>>>
>>>>
>>>>
>>>> https://www.ibm.com/docs/en/storage-ceph/8.0.0?topic=bluestore-resharding-rocksdb-database
>>>> <https://www.google.com/url?q=https://www.ibm.com/docs/en/storage-ceph/8.0.0?topic%3Dbluestore-resharding-rocksdb-database&source=gmail-imap&ust=1751031728000000&usg=AOvVaw2QwtMhyiksMW179R-6g3Ut&gt;
>>>> (Playbook from cephadm-ansible)
>>>>> On Jun 20, 2025, at 1:57 AM, Michel Jouvin <michel.jou...@ijclab.in2p3.fr>
>>>>> wrote:
>>>>>
>>>>> Hi Dev,
>>>>>
>>>>> Not sure to understand why there was these service deployment time-out, 
>>>>> the
>>>>> log says that one OSD failed, this may explain that that the upgrade is 
>>>>> not
>>>>> progressing anymore. The Bluestore slow ops (a new warning so not
>>>>> necessarily something new) on so many OSD seem to suggest that there
>>>>> something not optimal. As suggested in another thread recently it may be 
>>>>> an
>>>>> indication that you need to compact OSD.
>>>>>
>>>>> I am not sure what you adjusted but as long as the cluster works, I would
>>>>> not have changed parameters and try to fix the mentioned problems.
>>>>>
>>>>> Good luck,
>>>>>
>>>>> Michel
>>>>> Sent from my mobile
>>>>> Le 20 juin 2025 05:13:38 Devender Singh <deven...@netskrt.io> a écrit :
>>>>>> Here is the status
>>>>>>
>>>>>> # ceph orch upgrade status
>>>>>> {
>>>>>> "target_image":
>>>>>> "quay.io/ceph/ceph@sha256:8214ebff6133ac27d20659038df6962dbf9d77da21c9438a296b2e2059a56af6&quot;,
>>>>>> "in_progress": true,
>>>>>> "which": "Upgrading all daemon types on all hosts",
>>>>>> "services_complete": [
>>>>>> "crash",
>>>>>> "mgr",
>>>>>> "mon"
>>>>>> ],
>>>>>> "progress": "74/113 daemons upgraded",
>>>>>> "message": "",
>>>>>> "is_paused": false
>>>>>> }
>>>>>>
>>>>>> Regards
>>>>>> Dev
>>>>>>> On Jun 19, 2025, at 8:06 PM, Devender Singh <deven...@netskrt.io> wrote:
>>>>>>>
>>>>>>>
>>>>>>> Hello all
>>>>>>>
>>>>>>> I have a cluster where my cluster is in hung state, Some back fills are
>>>>>>> there but I reduced it to 1 but still upgrade not progressing…
>>>>>>> Please help…
>>>>>>>
>>>>>>> ```# ceph health detail
>>>>>>> HEALTH_WARN 8 OSD(s) experiencing slow operations in BlueStore; Failed 
>>>>>>> to
>>>>>>> apply 2 service(s): osd.all-available-devices,osd.iops_optimized; 1 
>>>>>>> failed
>>>>>>> cephadm daemon(s); failed to probe daemons or devices; 
>>>>>>> noscrub,nodeep-scrub
>>>>>>> flag(s) set; Degraded data redundancy: 1600150/39365198 objects degraded
>>>>>>> (4.065%), 93 pgs degraded, 103 pgs undersized; 127 pgs not 
>>>>>>> deep-scrubbed in
>>>>>>> time
>>>>>>> [WRN] BLUESTORE_SLOW_OP_ALERT: 8 OSD(s) experiencing slow operations in
>>>>>>> BlueStore
>>>>>>> osd.5 observed slow operation indications in BlueStore
>>>>>>> osd.9 observed slow operation indications in BlueStore
>>>>>>> osd.18 observed slow operation indications in BlueStore
>>>>>>> osd.36 observed slow operation indications in BlueStore
>>>>>>> osd.59 observed slow operation indications in BlueStore
>>>>>>> osd.66 observed slow operation indications in BlueStore
>>>>>>> osd.106 observed slow operation indications in BlueStore
>>>>>>> osd.110 observed slow operation indications in BlueStore
>>>>>>> [WRN] CEPHADM_APPLY_SPEC_FAIL: Failed to apply 2 service(s):
>>>>>>> osd.all-available-devices,osd.iops_optimized
>>>>>>> osd.all-available-devices: Command timed out on host cephadm deploy (osd
>>>>>>> daemon) (default 900 second timeout)
>>>>>>> osd.iops_optimized: Command timed out on host cephadm deploy (osd 
>>>>>>> daemon)
>>>>>>> (default 900 second timeout)
>>>>>>> [WRN] CEPHADM_FAILED_DAEMON: 1 failed cephadm daemon(s)
>>>>>>> daemon osd.19 on phl-prod-host04n.example.comis in error state
>>>>>>> [WRN] CEPHADM_REFRESH_FAILED: failed to probe daemons or devices
>>>>>>> Command "cephadm ceph-volume -- inventory" timed out on host
>>>>>>> phl-prod-converged03n.phl.netskrt.org (default 900 second timeout)
>>>>>>> [WRN] OSDMAP_FLAGS: noscrub,nodeep-scrub flag(s) set
>>>>>>> [WRN] PG_DEGRADED: Degraded data redundancy: 1600150/39365198 objects
>>>>>>> degraded (4.065%), 93 pgs degraded, 103 pgs undersized
>>>>>>> pg 25.0 is stuck undersized for 21h, current state
>>>>>>> active+undersized+degraded+remapped+backfill_wait, last acting
>>>>>>> [90,77,NONE,38,3]
>>>>>>> pg 25.1 is stuck undersized for 23h, current state
>>>>>>> active+undersized+degraded+remapped+backfill_wait, last acting
>>>>>>> [131,1,66,NONE,72]
>>>>>>> pg 25.c is stuck undersized for 23h, current state
>>>>>>> active+undersized+degraded+remapped+backfill_wait, last acting
>>>>>>> [1,135,110,28,NONE]
>>>>>>> pg 25.e is active+undersized+degraded+remapped+backfill_wait, acting
>>>>>>> [18,20,108,101,NONE]
>>>>>>> pg 25.14 is stuck undersized for 19h, current state
>>>>>>> active+undersized+degraded+remapped+backfill_wait, last acting
>>>>>>> [32,13,NONE,65,97]
>>>>>>> pg 25.17 is stuck undersized for 19h, current state
>>>>>>> active+undersized+degraded+remapped+backfill_wait, last acting
>>>>>>> [32,65,110,23,NONE]
>>>>>>> pg 25.1a is stuck undersized for 23h, current state
>>>>>>> active+undersized+degraded+remapped+backfill_wait, last acting
>>>>>>> [97,66,NONE,112,139]
>>>>>>> pg 25.1c is stuck undersized for 2h, current state
>>>>>>> active+undersized+degraded+remapped+backfill_wait, last acting
>>>>>>> [102,136,97,NONE,45]
>>>>>>> pg 25.1f is stuck undersized for 8h, current state
>>>>>>> active+undersized+degraded+remapped+backfill_wait, last acting
>>>>>>> [NONE,66,7,39,109]
>>>>>>> pg 26.45 is stuck undersized for 23h, current state
>>>>>>> active+undersized+degraded+remapped+backfill_wait, last acting [32,47]
>>>>>>> pg 26.48 is stuck undersized for 23h, current state
>>>>>>> active+undersized+degraded+remapped+backfill_wait, last acting [113,55]
>>>>>>> pg 26.4b is stuck undersized for 8h, current state
>>>>>>> active+undersized+degraded+remapped+backfill_wait, last acting [105,7]
>>>>>>> pg 26.59 is stuck undersized for 23h, current state
>>>>>>> active+undersized+degraded+remapped+backfill_wait, last acting [35,40]
>>>>>>> pg 26.6a is stuck undersized for 7h, current state
>>>>>>> active+undersized+degraded+remapped+backfill_wait, last acting [25,45]
>>>>>>> pg 26.74 is stuck undersized for 23h, current state
>>>>>>> active+undersized+degraded+remapped+backfill_wait, last acting [112,131]
>>>>>>> pg 26.77 is stuck undersized for 2h, current state
>>>>>>> active+undersized+degraded+remapped+backfill_wait, last acting [17,105]
>>>>>>> pg 26.9c is stuck undersized for 19h, current state
>>>>>>> active+undersized+degraded+remapped+backfill_wait, last acting [47,76]
>>>>>>> pg 26.bf is stuck undersized for 4m, current state
>>>>>>> active+undersized+degraded+remapped+backfill_wait, last acting [32,26]
>>>>>>> pg 26.c2 is stuck undersized for 2h, current state
>>>>>>> active+undersized+degraded+remapped+backfill_wait, last acting [97,135]
>>>>>>> pg 26.ec is stuck undersized for 4m, current state
>>>>>>> active+undersized+degraded+remapped+backfill_wait, last acting [26,80]
>>>>>>> pg 26.f7 is stuck undersized for 4m, current state
>>>>>>> active+undersized+degraded+remapped+backfill_wait, last acting [49,105]
>>>>>>> pg 31.12 is stuck undersized for 4m, current state
>>>>>>> active+undersized+degraded+remapped+backfill_wait, last acting
>>>>>>> [47,110,131,37,NONE]
>>>>>>> pg 31.19 is stuck undersized for 2h, current state
>>>>>>> active+undersized+degraded+remapped+backfill_wait, last acting
>>>>>>> [NONE,87,3,137,39]
>>>>>>> pg 31.1b is stuck undersized for 2h, current state
>>>>>>> active+undersized+degraded+remapped+backfill_wait, last acting
>>>>>>> [108,NONE,42,102,97]
>>>>>>> pg 31.1c is stuck undersized for 19h, current state
>>>>>>> active+undersized+degraded+remapped+backfill_wait, last acting
>>>>>>> [113,66,NONE,45,72]
>>>>>>> pg 31.41 is stuck undersized for 2h, current state
>>>>>>> active+undersized+remapped+backfill_wait, last acting 
>>>>>>> [14,101,NONE,67,18]
>>>>>>> pg 31.42 is stuck undersized for 7h, current state
>>>>>>> active+undersized+degraded+remapped+backfill_wait, last acting
>>>>>>> [37,134,NONE,82,8]
>>>>>>> pg 31.44 is stuck undersized for 4m, current state
>>>>>>> active+undersized+degraded+remapped+backfill_wait, last acting
>>>>>>> [49,NONE,101,3,62]
>>>>>>> pg 31.46 is stuck undersized for 4m, current state
>>>>>>> active+undersized+degraded+remapped+backfilling, last acting
>>>>>>> [95,38,NONE,25,102]
>>>>>>> pg 31.47 is stuck undersized for 18h, current state
>>>>>>> active+undersized+degraded+remapped+backfill_wait, last acting
>>>>>>> [60,NONE,130,72,110]
>>>>>>> pg 31.4a is stuck undersized for 4h, current state
>>>>>>> active+undersized+degraded+remapped+backfilling, last acting
>>>>>>> [66,NONE,135,82,14]
>>>>>>> pg 31.4c is stuck undersized for 4m, current state
>>>>>>> active+undersized+degraded+remapped+backfill_wait, last acting
>>>>>>> [34,NONE,1,82,18]
>>>>>>> pg 31.4d is stuck undersized for 2h, current state
>>>>>>> active+undersized+degraded+remapped+backfill_wait, last acting
>>>>>>> [101,13,65,30,NONE]
>>>>>>> pg 31.52 is stuck undersized for 4m, current state
>>>>>>> active+undersized+remapped+backfill_wait, last acting 
>>>>>>> [26,112,66,NONE,135]
>>>>>>> pg 31.53 is stuck undersized for 17h, current state
>>>>>>> active+undersized+degraded+remapped+backfill_wait, last acting
>>>>>>> [NONE,24,11,42,110]
>>>>>>> pg 31.55 is stuck undersized for 4m, current state
>>>>>>> active+undersized+degraded+remapped+backfill_wait, last acting
>>>>>>> [116,NONE,27,4,117]
>>>>>>> pg 31.57 is stuck undersized for 4m, current state
>>>>>>> active+undersized+degraded+remapped+backfill_wait, last acting
>>>>>>> [17,15,110,NONE,1]
>>>>>>> pg 31.5a is stuck undersized for 19h, current state
>>>>>>> active+undersized+degraded+remapped+backfill_wait, last acting
>>>>>>> [13,24,124,26,NONE]
>>>>>>> pg 31.5b is stuck undersized for 4m, current state
>>>>>>> active+undersized+degraded+remapped+backfill_wait, last acting
>>>>>>> [NONE,31,8,27,42]
>>>>>>> pg 31.5c is stuck undersized for 2h, current state
>>>>>>> active+undersized+degraded+remapped+backfill_wait, last acting
>>>>>>> [NONE,60,28,18,119]
>>>>>>> pg 31.5d is stuck undersized for 17h, current state
>>>>>>> active+undersized+degraded+remapped+backfill_wait, last acting
>>>>>>> [124,NONE,85,6,11]
>>>>>>> pg 31.5e is stuck undersized for 2h, current state
>>>>>>> active+undersized+degraded+remapped+backfill_wait, last acting
>>>>>>> [NONE,4,1,23,138]
>>>>>>> pg 31.60 is stuck undersized for 18h, current state
>>>>>>> active+undersized+degraded+remapped+backfilling, last acting
>>>>>>> [37,11,102,NONE,133]
>>>>>>> pg 31.64 is stuck undersized for 4m, current state
>>>>>>> active+undersized+remapped+backfill_wait, last acting 
>>>>>>> [26,106,45,34,NONE]
>>>>>>> pg 31.65 is stuck undersized for 4h, current state
>>>>>>> active+undersized+degraded+remapped+backfill_wait, last acting
>>>>>>> [NONE,14,127,62,3]
>>>>>>> pg 31.66 is stuck undersized for 4m, current state
>>>>>>> active+undersized+degraded+remapped+backfill_wait, last acting
>>>>>>> [34,NONE,35,23,59]
>>>>>>> pg 31.67 is stuck undersized for 4m, current state
>>>>>>> active+undersized+degraded+remapped+backfill_wait, last acting
>>>>>>> [67,24,127,8,NONE]
>>>>>>> pg 31.68 is stuck undersized for 4m, current state
>>>>>>> active+undersized+degraded+remapped+backfill_wait, last acting
>>>>>>> [32,41,18,17,NONE]
>>>>>>> pg 31.70 is stuck undersized for 19h, current state
>>>>>>> active+undersized+degraded+remapped+backfilling, last acting 
>>>>>>> [106,38,1,NONE,97]
>>>>>>> pg 31.75 is stuck undersized for 2h, current state
>>>>>>> active+undersized+degraded+remapped+backfill_wait, last acting
>>>>>>> [26,31,34,8,NONE]
>>>>>>> pg 31.7f is stuck undersized for 2h, current state
>>>>>>> active+undersized+remapped+backfilling, last acting [NONE,101,110,10,45]
>>>>>>> [WRN] PG_NOT_DEEP_SCRUBBED: 127 pgs not deep-scrubbed in time
>>>>>>> pg 26.e5 not deep-scrubbed since 2025-06-07T05:59:22.612553+0000
>>>>>>> pg 26.ca not deep-scrubbed since 2025-06-06T18:21:07.435823+0000
>>>>>>> pg 26.bf not deep-scrubbed since 2025-06-03T07:29:02.791095+0000
>>>>>>> pg 26.be not deep-scrubbed since 2025-06-04T07:55:54.389714+0000
>>>>>>> pg 26.a5 not deep-scrubbed since 2025-06-07T07:28:56.878096+0000
>>>>>>> pg 26.85 not deep-scrubbed since 2025-06-03T15:18:59.395929+0000
>>>>>>> pg 26.7f not deep-scrubbed since 2025-06-04T21:29:28.412637+0000
>>>>>>> pg 26.7e not deep-scrubbed since 2025-06-03T09:14:19.585388+0000
>>>>>>> pg 26.7d not deep-scrubbed since 2025-06-03T18:37:30.931020+0000
>>>>>>> pg 26.7c not deep-scrubbed since 2025-06-04T13:38:00.061488+0000
>>>>>>> pg 26.73 not deep-scrubbed since 2025-06-03T06:20:15.111819+0000
>>>>>>> pg 26.6f not deep-scrubbed since 2025-06-03T13:45:24.880397+0000
>>>>>>> pg 26.6e not deep-scrubbed since 2025-05-26T23:15:32.099862+0000
>>>>>>> pg 26.6d not deep-scrubbed since 2025-06-04T14:04:10.449101+0000
>>>>>>> pg 31.62 not deep-scrubbed since 2025-06-03T13:34:49.518456+0000
>>>>>>> pg 26.65 not deep-scrubbed since 2025-06-04T07:56:25.353411+0000
>>>>>>> pg 31.66 not deep-scrubbed since 2025-06-03T10:32:05.364424+0000
>>>>>>> pg 26.62 not deep-scrubbed since 2025-06-04T09:35:58.267976+0000
>>>>>>> pg 31.65 not deep-scrubbed since 2025-06-03T16:04:40.003140+0000
>>>>>>> pg 31.5b not deep-scrubbed since 2025-06-03T14:18:18.835477+0000
>>>>>>> pg 26.5d not deep-scrubbed since 2025-06-04T15:14:30.870252+0000
>>>>>>> pg 31.58 not deep-scrubbed since 2025-06-03T03:09:27.568605+0000
>>>>>>> pg 26.5c not deep-scrubbed since 2025-06-03T01:57:27.644129+0000
>>>>>>> pg 31.5f not deep-scrubbed since 2025-06-03T05:53:20.860393+0000
>>>>>>> pg 31.52 not deep-scrubbed since 2025-05-27T00:01:27.040861+0000
>>>>>>> pg 31.53 not deep-scrubbed since 2025-05-24T09:37:58.964829+0000
>>>>>>> pg 26.55 not deep-scrubbed since 2025-06-04T21:25:34.135356+0000
>>>>>>> pg 26.54 not deep-scrubbed since 2025-06-04T06:07:12.978734+0000
>>>>>>> pg 31.56 not deep-scrubbed since 2025-06-04T12:58:17.599712+0000
>>>>>>> pg 31.57 not deep-scrubbed since 2025-06-03T07:02:16.859990+0000
>>>>>>> pg 26.51 not deep-scrubbed since 2025-06-03T05:42:22.435483+0000
>>>>>>> pg 26.4f not deep-scrubbed since 2025-06-03T09:10:22.617328+0000
>>>>>>> pg 31.4a not deep-scrubbed since 2025-05-28T00:54:55.246532+0000
>>>>>>> pg 26.4e not deep-scrubbed since 2025-06-03T11:16:49.278513+0000
>>>>>>> pg 31.4b not deep-scrubbed since 2025-06-03T10:24:24.123351+0000
>>>>>>> pg 26.4d not deep-scrubbed since 2025-06-04T19:01:44.614410+0000
>>>>>>> pg 31.49 not deep-scrubbed since 2025-05-28T04:56:29.368285+0000
>>>>>>> pg 31.42 not deep-scrubbed since 2025-05-28T08:38:57.151865+0000
>>>>>>> pg 26.41 not deep-scrubbed since 2025-06-03T05:47:35.443867+0000
>>>>>>> pg 26.40 not deep-scrubbed since 2025-06-03T05:43:13.283668+0000
>>>>>>> pg 25.17 not deep-scrubbed since 2025-06-03T09:26:52.253625+0000
>>>>>>> pg 32.2e not deep-scrubbed since 2025-06-05T13:01:06.175389+0000
>>>>>>> pg 22.1a not deep-scrubbed since 2025-06-07T01:40:45.063268+0000
>>>>>>> pg 31.13 not deep-scrubbed since 2025-06-03T12:21:17.965218+0000
>>>>>>> pg 22.1b not deep-scrubbed since 2025-06-04T12:22:44.947751+0000
>>>>>>> pg 25.14 not deep-scrubbed since 2025-05-28T06:26:32.552200+0000
>>>>>>> pg 26.17 not deep-scrubbed since 2025-06-03T18:37:26.617483+0000
>>>>>>> pg 31.12 not deep-scrubbed since 2025-05-28T08:39:23.271194+0000
>>>>>>> pg 31.1c not deep-scrubbed since 2025-06-03T08:17:51.230187+0000
>>>>>>> pg 25.1f not deep-scrubbed since 2025-05-28T00:19:23.653883+0000
>>>>>>> 77 more pgs…
>>>>>>>
>>>>>>> Regards
>>>>>>> Dev
>>>>>>
>>>>>> _______________________________________________
>>>>>> ceph-users mailing list -- ceph-users@ceph.io
>>>>>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>>>>
>>>>> _______________________________________________
>>>>> ceph-users mailing list -- ceph-users@ceph.io
>>>>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>>
>>> _______________________________________________
>>> ceph-users mailing list -- ceph-users@ceph.io
>>> To unsubscribe send an email to ceph-users-le...@ceph.io
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: CEPH upgrade from 18.2.7 to 19.2.2 -- Hung from last 24h at 66%

Reply via email to