[ceph-users] Re: Newly added node osds getting full before rebalance finished

Szabo, Istvan (Agoda) Fri, 23 May 2025 07:53:13 -0700

Octopus yeah, seems like want to push to the new node bigger pgs or more data, 
I wonder if I add more disks to this node the weight (not crush weight) of the 
node would be "heavier" temporarily maybe... not sure what would be the best 
way to move forward, now I stopped the rebalance.

Get Outlook for Android<https://aka.ms/AAb9ysg>
________________________________
From: Anthony D'Atri <a...@dreamsnake.net>
Sent: Friday, May 23, 2025 8:55:31 PM
To: Szabo, Istvan (Agoda) <istvan.sz...@agoda.com>
Cc: Ceph Users <ceph-users@ceph.io>
Subject: Re: [ceph-users] Newly added node osds getting full before rebalance 
finished

Email received from the internet. If in doubt, don't click any link nor open 
any attachment !
________________________________
Octopus?  Those bugs were fixed by Hammer IIRC.

On May 23, 2025, at 9:45 AM, Szabo, Istvan (Agoda) <istvan.sz...@agoda.com> 
wrote:

sorry, typo, 15.2.17

Get Outlook for Android<https://aka.ms/AAb9ysg>
________________________________
From: Szabo, Istvan (Agoda) <istvan.sz...@agoda.com>
Sent: Friday, May 23, 2025 8:43:44 PM
To: Anthony D'Atri <a...@dreamsnake.net>
Cc: Ceph Users <ceph-users@ceph.io>
Subject: Re: [ceph-users] Newly added node osds getting full before rebalance 
finished

17.2.8

Get Outlook for Android<https://aka.ms/AAb9ysg>
________________________________
From: Anthony D'Atri <a...@dreamsnake.net>
Sent: Friday, May 23, 2025 8:05:54 PM
To: Szabo, Istvan (Agoda) <istvan.sz...@agoda.com>
Cc: Ceph Users <ceph-users@ceph.io>
Subject: Re: [ceph-users] Newly added node osds getting full before rebalance 
finished

Email received from the internet. If in doubt, don't click any link nor open 
any attachment !
________________________________

For due diligence I have to ask:  are you running a recent Ceph release, not 
like Firefly?  There were certain bugs back then...

> On May 23, 2025, at 4:07 AM, Szabo, Istvan (Agoda) <istvan.sz...@agoda.com> 
> wrote:
>
> Hi,
>
> For some reason this is the 2nd time I got into the same issue, added a new 
> node at the time when in the cluster the most full osd was 73% and all the 
> newly added osds now started to be between 70-80% full half way on the way to 
> finish the rebalance.

Did they or their hosts somehow get the wrong CRUSH weight?

`ceph osd df tree` and `ceph osd tree` would give us some insights.

> I don't understand why, could this be because mgr upmap mode with deviation 1?

No, that’s reasonable especially if you have OSDs of multiple sizes.

> Another question, I have 7 disks in each servers, however in the newly added 
> one I still have 4 as spare which is not added. Just to get out of this space 
> issue, would it cause any issue if I add the remaining 4 spare disk

I wouldn’t think so.

> (which will cause unbalanced server size in the cluster, the weight of each 
> server is 100TB, this would cause only this node to be 160TB).

Depending on your replication/EC profiles, CRUSH rules, and number of failure 
domains that extra capacity may or may not increase the cluster’s capacity, but 
for sure it should decrease the average fillage of the OSDs on that host, 
unless you’re adding them in such a way that skews the host’s CRUSH weight.

Sometimes with a thundering herd of backfill OSDs can temporarily increase in 
fillage.  When you add OSDs, it’s not entirely the case that only data “moving” 
to the new OSDs will be remapped, some existing data may be reshuffled as well, 
though this is much less than with old releases.  Also, when backfilling Ceph 
will write new replicas before removing displaced replicas, in order to ensure 
safety.  Usually this doesn’t result in quite the percentages you report, but 
upmap-remapped can help manage this dynamic.

>
> Thank you
>
> ________________________________
> This message is confidential and is for the sole use of the intended 
> recipient(s). It may also be privileged or otherwise protected by copyright 
> or other legal rules. If you have received it by mistake please let us know 
> by reply email and delete it from your system. It is prohibited to copy this 
> message or disclose its content to anyone. Any confidentiality or privilege 
> is not waived or lost by any mistaken delivery or unauthorized disclosure of 
> the message. All messages sent to and from Agoda may be monitored to ensure 
> compliance with company policies, to protect the company's interests and to 
> remove potential malware. Electronic messages may be intercepted, amended, 
> lost or deleted, or contain viruses.
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Newly added node osds getting full before rebalance finished

Reply via email to