Are those problematic OSDs getting almost full ? I do not have Ubuntu account
to check their pastebin.Надіслано з пристрою Galaxy
-------- Оригінальне повідомлення --------Від: mhnx <morphinwith...@gmail.com>
Дата: 08.11.21 15:31 (GMT+02:00) Кому: Ceph Users <ceph-users@ceph.io> Тема:
[ceph-users] allocate_bluefs_freespace failed to allocate Hello.I'm using
Nautilus 14.2.16I have 30 SSD in my cluster and I use them as Bluestore OSD for
RGW index.Almost every week I'm losing (down) an OSD and when I check osd log I
see: -6> 2021-11-06 19:01:10.854 7fa799989c40 1 *bluefs _allocatefailed to
allocate 0xf4f04 on bdev 1, free 0xb0000; fallback to bdev2* -5> 2021-11-06
19:01:10.854 7fa799989c40 1 *bluefs _allocateunable to allocate 0xf4f04 on
bdev 2, free 0xffffffffffffffff;fallback to slow device expander* -4>
2021-11-06 19:01:10.854 7fa799989c40 -1bluestore(/var/lib/ceph/osd/ceph-218)
*allocate_bluefs_freespacefailed to allocate on* 0x80000000 min_size 0x100000 >
allocated total0x0 bluefs_shared_alloc_size 0x10000 allocated 0x0 available
0xa497aab000 -3> 2021-11-06 19:01:10.854 7fa799989c40 -1 *bluefs
_allocatefailed to expand slow device to fit +0xf4f04*Full log:
https://paste.ubuntu.com/p/MpJfVjMh7V/plain/And OSD does not start without
offline compaction.Offline compaction log:
https://paste.ubuntu.com/p/vFZcYnxQWh/plain/After the Offline compaction I
tried to start OSD with bitmap allocator butit is not getting up because of "
FAILED ceph_assert(available >=allocated)"Log:
https://paste.ubuntu.com/p/2Bbx983494/plain/Then I start the OSD with hybrid
allocator and let it recover.When the recover is done I stop the OSD and start
with the bitmapallocator.This time it came up but I've got "80 slow ops, oldest
one blocked for 116sec, osd.218 has slow ops" and I increased
"osd_recovery_sleep 10" to givea breath to cluster and cluster marked the osd
as down (it was stillworking) after a while the osd marked up and cluster
became normal. Butwhile recovering, other osd's started to give slow ops and
I've playedaround with "osd_recovery_sleep 0.1 <---> 10" to keep the cluster
stabletill recovery finishes.Ceph osd df tree before:
https://paste.ubuntu.com/p/4K7JXcZ8FJ/plain/Ceph osd df tree after osd.218 =
bitmap:https://paste.ubuntu.com/p/5SKbhrbgVM/plain/If I want to change all
other osd's allocator to bitmap, I need to repeatthe process 29 time and it
will take too much time.I don't want to heal OSDs with the offline compaction
anymore so I will dothat if that's the solution but I want to be sure before
doing a lot ofwork and maybe with the issue I can provide helpful logs and
informationfor developers.Have a nice
day.Thanks._______________________________________________ceph-users mailing
list -- ceph-us...@ceph.ioto unsubscribe send an email to
ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io