Hi Marco,
the following log line (unfortunately it was cut off) sheds some light:
"
Apr 29 06:24:09 prdhcistonode01 bash[23886]: debug
2025-04-29T10:24:09.287+0000 7f6961ae9740 -1
bluestore(/var/lib/ceph/osd/ceph-3) _set_cache_sizes bluestore_cache_meta_>
"
Likely it says that sum of bluestore_cache_meta_ratio +
bluestore_cache_kv_ratio + bluestore_cache_kv_onode_ratio config
parameters exceeds 1.0
So one has to tune the parameters in a way to get the sum less or equal
to 1.0.
Default settings are:
bluestore_cache_meta_ratio = 0.45
bluestore_cache_kv_ratio = 0.45
bluestore_cache_kv_onode_ratio = 0.04
Thanks,
Igor
On 29.04.2025 13:36, Marco Pizzolo wrote:
Hello Everyone,
I'm upgrading from 18.2.4 to 18.2.6, and I have a 4-node cluster with 8
NVMe's per node. Each NVMe is split into 2 OSDs. The upgrade went through
the mgr, mon, crash and began upgrading OSDs.
The OSDs it was upgrading were not coming back online.
I tried rebooting, and no luck.
journalctl -xe shows the following:
░░ The unit
docker-02cb79ef9a657cdaa26b781966aa6d2f1d5e54cdc9efa6c5ff1f0e98c3a866e4.scope
has successfully entered the 'dead' state.
Apr 29 06:24:09 prdhcistonode01 dockerd[2967]:
time="2025-04-29T06:24:09.282073583-04:00" level=info msg="ignoring event"
container=76c56ddd668015de0022bfa2527060e64a9513>
Apr 29 06:24:09 prdhcistonode01 containerd[2797]:
time="2025-04-29T06:24:09.282129114-04:00" level=info msg="shim
disconnected" id=76c56ddd668015de0022bfa2527060e64a95137>
Apr 29 06:24:09 prdhcistonode01 containerd[2797]:
time="2025-04-29T06:24:09.282219664-04:00" level=warning msg="cleaning up
after shim disconnected" id=76c56ddd668015de00>
Apr 29 06:24:09 prdhcistonode01 containerd[2797]:
time="2025-04-29T06:24:09.282242484-04:00" level=info msg="cleaning up dead
shim"
Apr 29 06:24:09 prdhcistonode01 bash[23886]: debug
2025-04-29T10:24:09.287+0000 7f6961ae9740 1 mClockScheduler:
set_osd_capacity_params_from_config: osd_bandwidth_cost_p>
Apr 29 06:24:09 prdhcistonode01 bash[23886]: debug
2025-04-29T10:24:09.287+0000 7f6961ae9740 0 osd.3:0.OSDShard using op
scheduler mclock_scheduler, cutoff=196
Apr 29 06:24:09 prdhcistonode01 bash[23886]: debug
2025-04-29T10:24:09.287+0000 7f6961ae9740 1 bdev(0x56046b4c8000
/var/lib/ceph/osd/ceph-3/block) open path /var/lib/cep>
Apr 29 06:24:09 prdhcistonode01 containerd[2797]:
time="2025-04-29T06:24:09.292047607-04:00" level=warning msg="cleanup
warnings time=\"2025-04-29T06:24:09-04:00\" level=>
Apr 29 06:24:09 prdhcistonode01 dockerd[2967]:
time="2025-04-29T06:24:09.292163618-04:00" level=info msg="ignoring event"
container=02cb79ef9a657cdaa26b781966aa6d2f1d5e54>
Apr 29 06:24:09 prdhcistonode01 containerd[2797]:
time="2025-04-29T06:24:09.292216428-04:00" level=info msg="shim
disconnected" id=02cb79ef9a657cdaa26b781966aa6d2f1d5e54c>
Apr 29 06:24:09 prdhcistonode01 containerd[2797]:
time="2025-04-29T06:24:09.292277279-04:00" level=warning msg="cleaning up
after shim disconnected" id=02cb79ef9a657cdaa2>
Apr 29 06:24:09 prdhcistonode01 containerd[2797]:
time="2025-04-29T06:24:09.292291949-04:00" level=info msg="cleaning up dead
shim"
Apr 29 06:24:09 prdhcistonode01 bash[23886]: debug
2025-04-29T10:24:09.287+0000 7f6961ae9740 1 bdev(0x56046b4c8000
/var/lib/ceph/osd/ceph-3/block) open size 640122932428>
Apr 29 06:24:09 prdhcistonode01 bash[23886]: debug
2025-04-29T10:24:09.287+0000 7f6961ae9740 -1
bluestore(/var/lib/ceph/osd/ceph-3) _set_cache_sizes bluestore_cache_meta_>
Apr 29 06:24:09 prdhcistonode01 bash[23886]: debug
2025-04-29T10:24:09.287+0000 7f6961ae9740 1 bdev(0x56046b4c8000
/var/lib/ceph/osd/ceph-3/block) close
Apr 29 06:24:09 prdhcistonode01 containerd[2797]:
time="2025-04-29T06:24:09.303385220-04:00" level=warning msg="cleanup
warnings time=\"2025-04-29T06:24:09-04:00\" level=>
Apr 29 06:24:09 prdhcistonode01 bash[24158]: debug
2025-04-29T10:24:09.307+0000 7f2c10403740 1 mClockScheduler:
set_osd_capacity_params_from_config: osd_bandwidth_cost_p>
Apr 29 06:24:09 prdhcistonode01 bash[24158]: debug
2025-04-29T10:24:09.307+0000 7f2c10403740 0 osd.0:0.OSDShard using op
scheduler mclock_scheduler, cutoff=196
Apr 29 06:24:09 prdhcistonode01 bash[23144]: debug
2025-04-29T10:24:09.307+0000 7f12f08c5740 -1 osd.15 0 OSD:init: unable to
mount object store
Apr 29 06:24:09 prdhcistonode01 bash[23144]: debug
2025-04-29T10:24:09.307+0000 7f12f08c5740 -1 ** ERROR: osd init failed:
(22) Invalid argument
Apr 29 06:24:09 prdhcistonode01 bash[24158]: debug
2025-04-29T10:24:09.307+0000 7f2c10403740 1 bdev(0x55d5e45f0000
/var/lib/ceph/osd/ceph-0/block) open path /var/lib/cep>
Apr 29 06:24:09 prdhcistonode01 bash[24158]: debug
2025-04-29T10:24:09.307+0000 7f2c10403740 1 bdev(0x55d5e45f0000
/var/lib/ceph/osd/ceph-0/block) open size 640122932428>
Apr 29 06:24:09 prdhcistonode01 bash[24158]: debug
2025-04-29T10:24:09.307+0000 7f2c10403740 -1
bluestore(/var/lib/ceph/osd/ceph-0) _set_cache_sizes bluestore_cache_meta_>
Apr 29 06:24:09 prdhcistonode01 bash[24158]: debug
2025-04-29T10:24:09.307+0000 7f2c10403740 1 bdev(0x55d5e45f0000
/var/lib/ceph/osd/ceph-0/block) close
Apr 29 06:24:09 prdhcistonode01 bash[24328]: debug
2025-04-29T10:24:09.363+0000 7f30b83b1740 1 mClockScheduler:
set_osd_capacity_params_from_config: osd_bandwidth_cost_p>
Apr 29 06:24:09 prdhcistonode01 bash[24328]: debug
2025-04-29T10:24:09.363+0000 7f30b83b1740 0 osd.8:0.OSDShard using op
scheduler mclock_scheduler, cutoff=196
Apr 29 06:24:09 prdhcistonode01 bash[24328]: debug
2025-04-29T10:24:09.363+0000 7f30b83b1740 1 bdev(0x555f40688000
/var/lib/ceph/osd/ceph-8/block) open path /var/lib/cep>
Apr 29 06:24:09 prdhcistonode01 bash[24328]: debug
2025-04-29T10:24:09.363+0000 7f30b83b1740 1 bdev(0x555f40688000
/var/lib/ceph/osd/ceph-8/block) open size 640122932428>
Apr 29 06:24:09 prdhcistonode01 bash[24328]: debug
2025-04-29T10:24:09.363+0000 7f30b83b1740 -1
bluestore(/var/lib/ceph/osd/ceph-8) _set_cache_sizes bluestore_cache_meta_>
Apr 29 06:24:09 prdhcistonode01 bash[24328]: debug
2025-04-29T10:24:09.363+0000 7f30b83b1740 1 bdev(0x555f40688000
/var/lib/ceph/osd/ceph-8/block) close
Apr 29 06:24:09 prdhcistonode01 systemd[1]:
ceph-fbc38f5c-a3a6-11ea-805c-3b954db9ce7a@osd.12.service: Main process
exited, code=exited, status=1/FAILURE
Any help you can offer would be greatly appreciated. This is running in
docker:
Client: Docker Engine - Community
Version: 24.0.7
API version: 1.43
Go version: go1.20.10
Git commit: afdd53b
Built: Thu Oct 26 09:08:01 2023
OS/Arch: linux/amd64
Context: default
Server: Docker Engine - Community
Engine:
Version: 24.0.7
API version: 1.43 (minimum version 1.12)
Go version: go1.20.10
Git commit: 311b9ff
Built: Thu Oct 26 09:08:01 2023
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.6.25
GitCommit: d8f198a4ed8892c764191ef7b3b06d8a2eeb5c7f
runc:
Version: 1.1.10
GitCommit: v1.1.10-0-g18a0cb0
docker-init:
Version: 0.19.0
GitCommit: de40ad0
Thanks,
Marco
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io