[ceph-users] Re: Many misplaced PG's, full OSD's and a good amount of manual intervention to keep my Ceph cluster alive.

Bruno Gomes Pessanha Wed, 15 Jan 2025 15:09:15 -0800

Hi everyone. Yes. All the tips definitely helped! Now I have more free
space in the pools, the number of misplaced PG's decreased a lot and lower
std deviation of the usage of OSD's. The storage looks way healthier now.
Thanks a bunch!


I'm only confused by the number of misplaced PG's which never goes
below 5%. Every time it hits 5% it goes up and down like shown in this
quite interesting graph:
[image: image.png]

Any idea why that might be?

I had the impression that it might be related to the autobalancer that
kicks in and pg's are misplaced again. Or am I missing something?

Bruno

On Mon, 6 Jan 2025 at 16:00, Bruno Gomes Pessanha <bruno.pessa...@gmail.com>
wrote:

> So you might set the full ratio to .98, backfillfull to .96.  Nearfull is
>> only cosmetic.
>
> Thanks for the advice. It seems to be working with 0.92 for now. If it
> gets stuck I'll increase it.
>
> On Mon, 6 Jan 2025 at 00:24, Anthony D'Atri <anthony.da...@gmail.com>
> wrote:
>
>>
>>
>> Very solid advice here - that’s the beauty of Ceph community.
>>
>> Just adding to what Anthony mentioned: a reweight from 1 to 0.2 (and
>> back) is quite extreme and the cluster won’t like it.
>>
>>
>> And these days with the balancer, pg-upmap entries to the same effect are
>> a better idea.
>>
>> From the clients perspective Your main concern now is to keep the pools
>> “alive” with enough space while the backfilling takes place.
>>
>>
>> To that end, you can *temporarily* give yourself a bit more margin:
>>
>> ceph osd set-nearfull-ratio .85
>> ceph osd set-backfillfull-ratio .90
>> ceph osd set-full-ratio .95
>>
>> Those are the default values, and Ceph (now) enforces that the values are
>> >= (or maybe >) in that order.
>>
>> So you might set the full ratio to .98, backfillfull to .96.  Nearfull is
>> only cosmetic.
>>
>> But absolutely do not forget to revert to default values once the cluster
>> is balanced, or to other values that you make an educated decision to
>> choose.
>>
>> Even with plenty of OSDs that are not filled you might hit a single
>> overfilled OSD and the whole pool will stop accepting new data.
>>
>>
>> Yep, see above.  Not immediately clear to me why that data pool is so
>> full unless the CRUSH rule / device classes are wonky.
>>
>> Clients will start getting “No more space available” errors. That
>> happened to us with CephFS recently with a very similar scenario where the
>> cluster got much more data than expected in a short amount of time, not
>> fun.
>> With the balancer not working due to too many misplaced objects that’s an
>> increased risk so just heads up and keep that in mind. To get things
>> working we simply balanced manually the OSDs with upmaps moving data from
>> the most full ones to the least full ones (our builtin balancer sadly
>> does not work).
>>
>>
>> One small observation:
>> I’ve noticed that 'ceph osd pool ls detail |grep cephfs.cephfs01.data’
>> has pg_num increased but the pgp_num is still the same.
>> You will need to set it as well for data migration to new pgs to happen:
>> https://docs.ceph.com/en/mimic/rados/operations/placement-groups/#set-the-number-of-placement-groups
>>
>>
>> The mgr usually does that for recent Ceph releases.  With older releases
>> we had to incremental pg_num and pgp_num in lockstep, which was kind of a
>> pain.
>>
>>
>>
>> Best,
>>
>> *Laimis J.*
>>
>> On 5 Jan 2025, at 16:11, Anthony D'Atri <anthony.da...@gmail.com> wrote:
>>
>>
>> What reweighs have been set for the top OSDs (ceph osd df tree)?
>>
>> Right now they are all at 1.0. I had to lower them to something close to
>> 0.2 in order to free up space but I changed them back to 1.0. Should I
>> lower them while the backfill is happening?
>>
>>
>> Old-style legacy override reweights don’t mesh well with the balancer.
>>   Best to leave them at 1.00.
>>
>> 0.2 is pretty extreme, back in the day I rarely went below 0.8.
>>
>> ```
>> "optimize_result": "Too many objects (0.355160 > 0.050000) are misplaced;
>> try again late
>> ```
>>
>>
>> That should clear.  The balancer doesn’t want to stir up trouble if the
>> cluster already has a bunch of backfill / recovery going on.  Patience!
>>
>> default.rgw.buckets.data    10  1024  197 TiB  133.75M  592 TiB  93.69
>>   13 TiB
>> default.rgw.buckets.non-ec  11    32   78 MiB    1.43M   17 GiB
>>
>>
>> That’s odd that the data pool is that full but the others aren’t.
>>
>> Please send `ceph osd crush rule dump `.  And `ceph osd dump | grep pool`
>>
>>
>>
>> I also tried changing the following but it does not seem to persist:
>>
>>
>> Could be an mclock thing.
>>
>> 1. Why I ended up with so many misplaced PG's since there were no changes
>> on the cluster: number of osd's, hosts, etc.
>>
>>
>> Probably a result of the autoscaler splitting PGs or of some change to
>> CRUSH rules such that some data can’t be placed.
>>
>> 2. Is it ok to change the target_max_misplaced_ratio to something higher
>> than .05 so the autobalancer would work and I wouldn't have to constantly
>> rebalance the osd's manually?
>>
>>
>> I wouldn’t, that’s a symptom not the disease.
>>
>> Bruno
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>>
>>
>>
>>
>> --
>> Bruno Gomes Pessanha
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>>
>>
>>
>
> --
> Bruno Gomes Pessanha
>


-- 
Bruno Gomes Pessanha

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Many misplaced PG's, full OSD's and a good amount of manual intervention to keep my Ceph cluster alive.

Reply via email to