[ceph-users] Increase amount of OSDs on nodes

Florian Schwab Thu, 24 Apr 2025 02:14:02 -0700

Hi everyone,

I’m currently facing the following challenge were I’d like to hear what the 
community thinks on how to solve it.


We have a ceph cluster (only using RGW) with which we had some performance 
issues when initially setting it up and instead of using all 60 HDDs installed 
in each server we only use half (30) of the HDDs - they exist but are set to 
“out”. Using only half of the OSDs per server solved our performance issues - 
not the part of my question just a bit of backstory for this weird setup.
We have now optimised things and are looking into getting those OSDs back “in”. 
However we have quite some client load on the system and we want to keep the 
any necessary downtime (for users) low.

The HDDs are only used for the buckets.data pool (using erasure coding) as the 
other pools necessary for RGW are on separate NVMe drives.

To achieve this I was looking into using pgremapper and upmap-remapped. My 
expectation was to be able to remap all PGs to stay on the “old” OSDs (the ones 
that are currently “in”) and have zero PGs on the “new” OSDs (the ones that are 
currently “out”). If that part is done I thought somehow I can double the 
amount of PGs to cover the now doubled capacity for the pool.

What I tried so far (with some variations):
- Set norecover and nobackfill flags
- Set “new” OSDs to “in”
- Use pgremapper and/or upmap-remapped to create upmap entries so the acting 
OSDs of a PGs are unchanged
- Increase the amount of PGs
- Map new PGs to “new” OSDs
- (Maybe remove the upmap entries for the PGs one at a time over an extend 
period to allow for automatic balancing again)

I’m trying this on a smaller test setup with 6 servers and 8 OSDs per server (4 
in / 4 out) so 48 OSDs in total. For testing I filled the “in” OSDs to around 
50% so the overall used capacity is also 50% - when bringing the other 24 OSDs 
"in" this is of course means only 25% of the overall capacity is used.

The following information is based on this test setup.

The pgremapper/upmap-remapped part is where I’m currently stuck as I always end 
up with around 2% of objects being misplaced. What I was also able to manage is 
that if I after setting all OSDs to “in”, remap the PGs, increase the PG count 
and remap again I end up around 3% of misplaced objects however the 
additional/new OSDs then will have zero PGs again.

Is it somehow possible to add the OSDs and double the PG count without causing 
any (or only minimal) backfill?
If possible I also want to bring in the “new” OSDs in batches so we can check 
the performance after each batch to avoid having issues in the final state with 
all OSDs set to “in”.


Thanks in advance for any input.


Cheers,
Florian
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Increase amount of OSDs on nodes

Reply via email to