[ceph-users] Re: Many misplaced PG's, full OSD's and a good amount of manual intervention to keep my Ceph cluster alive.

Laimis Juzeliūnas Sun, 05 Jan 2025 15:15:02 -0800

Very solid advice here - that’s the beauty of Ceph community.

Just adding to what Anthony mentioned: a reweight from 1 to 0.2 (and back) is 
quite extreme and the cluster won’t like it.
We never go above increments/decrements of 0.02-0.04. If you have to go from 1 
to 0.98 and then to 0.96 and so on leaving enough time for the cluster to 
settle in between.
How did backfilling look when changing? Did you see a decrease in backfills 
after reverting back to 1?

From the clients perspective Your main concern now is to keep the pools “alive” 
with enough space while the backfilling takes place.
Even with plenty of OSDs that are not filled you might hit a single overfilled 
OSD and the whole pool will stop accepting new data. 
Clients will start getting “No more space available” errors. That happened to 
us with CephFS recently with a very similar scenario where the cluster got much 
more data than expected in a short amount of time, not fun. 
With the balancer not working due to too many misplaced objects that’s an 
increased risk so just heads up and keep that in mind. To get things working we 
simply balanced manually the OSDs with upmaps moving data from the most full 
ones to the least full ones (our builtin balancer sadly does not work).

One small observation:
I’ve noticed that 'ceph osd pool ls detail |grep cephfs.cephfs01.data’ has 
pg_num increased but the pgp_num is still the same.
You will need to set it as well for data migration to new pgs to happen: 
https://docs.ceph.com/en/mimic/rados/operations/placement-groups/#set-the-number-of-placement-groups

Best,
Laimis J.

> On 5 Jan 2025, at 16:11, Anthony D'Atri <anthony.da...@gmail.com> wrote:
> 
> 
>>> What reweighs have been set for the top OSDs (ceph osd df tree)?
>>> 
>> Right now they are all at 1.0. I had to lower them to something close to
>> 0.2 in order to free up space but I changed them back to 1.0. Should I
>> lower them while the backfill is happening?
> 
> Old-style legacy override reweights don’t mesh well with the balancer.   Best 
> to leave them at 1.00.  
> 
> 0.2 is pretty extreme, back in the day I rarely went below 0.8.   
> 
>>> ```
>>> "optimize_result": "Too many objects (0.355160 > 0.050000) are misplaced;
>>> try again late
>>> ```
> 
> That should clear.  The balancer doesn’t want to stir up trouble if the 
> cluster already has a bunch of backfill / recovery going on.  Patience!
> 
>>> default.rgw.buckets.data    10  1024  197 TiB  133.75M  592 TiB  93.69
>>>   13 TiB
>>> default.rgw.buckets.non-ec  11    32   78 MiB    1.43M   17 GiB   
> 
> That’s odd that the data pool is that full but the others aren’t.  
> 
> Please send `ceph osd crush rule dump `.  And `ceph osd dump | grep pool`
> 
> 
>>> 
>>> I also tried changing the following but it does not seem to persist:
> 
> Could be an mclock thing.  
> 
>>> 1. Why I ended up with so many misplaced PG's since there were no changes
>>> on the cluster: number of osd's, hosts, etc.
> 
> Probably a result of the autoscaler splitting PGs or of some change to CRUSH 
> rules such that some data can’t be placed.
> 
>>> 2. Is it ok to change the target_max_misplaced_ratio to something higher
>>> than .05 so the autobalancer would work and I wouldn't have to constantly
>>> rebalance the osd's manually?
> 
> I wouldn’t, that’s a symptom not the disease.  
>>> Bruno
>>> _______________________________________________
>>> ceph-users mailing list -- ceph-users@ceph.io
>>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>> 
>>> 
>>> 
>>> 
>> 
>> --
>> Bruno Gomes Pessanha
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Many misplaced PG's, full OSD's and a good amount of manual intervention to keep my Ceph cluster alive.

Reply via email to