[ceph-users] Re: mclock scheduler on 19.2.1

Alexander Patrakov Fri, 22 Aug 2025 08:42:10 -0700

You need to recreate all OSDs that were created on Squid with elastic
shared blobs.


To figure out if an OSD needs to be recreated, you can run this command:

ceph-bluestore-tool --path /var/lib/ceph/osd/ceph-X --command show-label |
grep elastic

This can be tried safely even if the OSD is running. Again, all OSDs that
have shared elastic blobs enabled must be recreated.


On Fri, Aug 22, 2025 at 8:12 PM Curt <light...@gmail.com> wrote:

> Thank you, that worked. I completely forgot about upmap.
>
> Question I just saw that a work around for the bug is to
> set bluestore_elastic_shared_blobs false. It's not really clear to me. Does
> this stop the bug from hitting osd's that are already running or new osd's
> deployed after being set, so I would need to recreate/deploy all my OSD's?
>
> Thanks,
> Curt
>
> On Tue, Aug 19, 2025 at 1:40 PM Alexander Patrakov <patra...@gmail.com>
> wrote:
>
>> Suggestions:
>>
>> 1. Figure out which OSDs are unsafe to stop.
>> 2. Slowly restart every other OSD
>> 3. Figure out which PGs are degraded
>> 4. Use the "ceph osd pg-upmap-items" command to redirect their recovery
>> to already-restarted OSDs
>> 5. At this point, the set of OSDs that are unsafe to restart should
>> contain only already-restarted OSDs
>> 6. Restart the remaining OSDs
>>
>> P.S. Not tested.
>>
>>
>> On Tue, Aug 19, 2025 at 5:31 PM Curt <light...@gmail.com> wrote:
>>
>>> Hello all,
>>>
>>> I'm sure this has been discussed before, but I can't seem to find it. I
>>> know on older versions of Ceph there was an issue with mclock having no
>>> recovery and switching to wpq fixed it. Is this still an issue with
>>> 19.2.1?
>>>
>>> I recently ran into this bug  <https://tracker.ceph.com/issues/70390>and
>>> various issues with it. In order to help recovery I set norebalance flag,
>>> so it would focus solely on undersized PGs. The issue I'm seeing though
>>> is
>>> sometimes recovering will show nothing despite having
>>> X active+undersized+remapped+backfilling. Sometimes restarting a few
>>> OSD's
>>> will fix the issue and it will start again.
>>>
>>> I'm tempted to switch to wpq, but that would mean having to slowly
>>> restart
>>> each OSD, which with undersized would cause IO to stop while some OSD's
>>> are
>>> restarted. Wanted to get others' thoughts before making the change.
>>>
>>> Thanks,
>>> Curt
>>> _______________________________________________
>>> ceph-users mailing list -- ceph-users@ceph.io
>>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>>
>>
>>
>> --
>> Alexander Patrakov
>>
>

-- 
Alexander Patrakov
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: mclock scheduler on 19.2.1

Reply via email to