[ceph-users] Re: Suspiciously low PG count for CephFS with many small files

Anthony D'Atri Mon, 30 Jun 2025 11:09:16 -0700


> (Adding back the list)


I had written privately on purpose.

> 
>> https://docs.clyso.com/docs/kb/rados/#osd---improved-procedure-for-adding-hosts-or-osds
>>  
>> <https://docs.clyso.com/docs/kb/rados/#osd---improved-procedure-for-adding-hosts-or-osds>
> 
> Is the upstream source for `upmap-remapped.py` the `ceph-scripts` repo?
> 
> If yes, is it known why there are 2 different places for `ceph-scripts` 
> (Gitlab and Github), with diverged histories, both of which are recently 
> edited, by the same individuals?
> 
> https://github.com/cernceph/ceph-scripts/commits/master/tools/upmap/upmap-remapped.py
> https://gitlab.cern.ch/ceph/ceph-scripts/-/commits/master/tools/upmap/upmap-remapped.py

I will defer to Dan van der Ster on that.

> 
>> If I understand what you’re getting at, I don’t think that’s the case.  I 
>> *think* recent releases give priority to undersized / degraded PGs over 
>> remapped PGs.
> 
> It's not really `degraded` vs `misplaced` I'm getting at.
> 
> But instead, consider this scneario:
> 
> We have
> 
> * osd.1 (very full, 100 GB free)
> * osd.2 (very full, 100 GB free)
> * osd.30 (empty)
> 
> and we have
> 
> * PGa on osd.1, which Ceph now wants to be on osd.2
> * PGb on osd.2, which Ceph now wants to be on osd.30
> 
> The simple thing that could theoretically be done is that simultaneously be 
> done is that the PGa (osd.1 -> osd.2) move, and the PGb (osd2. -> osd.30) 
> move, are started immediately and run in parallel.
> 
> But I suspect that what Ceph does is to make decisions on PG granularity:
> 
> 1. See that there's no space on osd.2 for PGa, thus not start the move.
> 2. Move PGb from osd.2 to osd.30. Wait until that's done (takes long if PGs 
> are large).
> 3. See that now there enough space on osd.2 for PGa, thus only now start the 
> move.
> 
> Is this really what happens?

Yes, backfill will not proceed to an OSD that is past the backfillful threshold.
It’s also the case that the cluster usually cannot sustain all backfill / 
recovery ops at the same time; there are priority mechanisms as well as various 
throttles to avoid a thundering herd DoSing client ops.  I will have to defer 
to the source and devs for the particulars of that process, as it has evolved 
over time.

> 
> Thanks!
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Suspiciously low PG count for CephFS with many small files

Reply via email to