[ceph-users] Re: Newly added node osds getting full before rebalance finished

Szabo, Istvan (Agoda) Sat, 24 May 2025 19:41:39 -0700

Stopped for 1.5 days already, it went down the usage from 88% to 62 or less on 
that node osds, however the missplaced objects went up from 5% to almost 7% 
most probably due to newly created stuff.


Here is some details, the newly added node is the serverosd-2015:
https://gist.githubusercontent.com/Badb0yBadb0y/8addf9f3df0406abc3397dd9e7f1aeca/raw/8dc686576f9dcea12542996fd6bacbac1050a89c/gistfile1.txt

________________________________
From: Anthony D'Atri <anthony.da...@gmail.com>
Sent: Saturday, May 24, 2025 5:44 AM
To: Szabo, Istvan (Agoda) <istvan.sz...@agoda.com>
Cc: Anthony D'Atri <a...@dreamsnake.net>; Ceph Users <ceph-users@ceph.io>
Subject: Re: [ceph-users] Newly added node osds getting full before rebalance 
finished

Email received from the internet. If in doubt, don't click any link nor open 
any attachment !
________________________________

Please do send links to crush dump, Ceph OSD tree, Ceph OSD df

> On May 23, 2025, at 10:53 AM, Szabo, Istvan (Agoda) <istvan.sz...@agoda.com> 
> wrote:
>
> Octopus yeah, seems like want to push to the new node bigger pgs or more 
> data, I wonder if I add more disks to this node the weight (not crush weight) 
> of the node would be "heavier" temporarily maybe... not sure what would be 
> the best way to move forward, now I stopped the rebalance.
>
> Get Outlook for Android<https://aka.ms/AAb9ysg>
> ________________________________
> From: Anthony D'Atri <a...@dreamsnake.net>
> Sent: Friday, May 23, 2025 8:55:31 PM
> To: Szabo, Istvan (Agoda) <istvan.sz...@agoda.com>
> Cc: Ceph Users <ceph-users@ceph.io>
> Subject: Re: [ceph-users] Newly added node osds getting full before rebalance 
> finished
>
> Email received from the internet. If in doubt, don't click any link nor open 
> any attachment !
> ________________________________
> Octopus?  Those bugs were fixed by Hammer IIRC.
>
> On May 23, 2025, at 9:45 AM, Szabo, Istvan (Agoda) <istvan.sz...@agoda.com> 
> wrote:
>
> sorry, typo, 15.2.17
>
> Get Outlook for Android<https://aka.ms/AAb9ysg>
> ________________________________
> From: Szabo, Istvan (Agoda) <istvan.sz...@agoda.com>
> Sent: Friday, May 23, 2025 8:43:44 PM
> To: Anthony D'Atri <a...@dreamsnake.net>
> Cc: Ceph Users <ceph-users@ceph.io>
> Subject: Re: [ceph-users] Newly added node osds getting full before rebalance 
> finished
>
> 17.2.8
>
> Get Outlook for Android<https://aka.ms/AAb9ysg>
> ________________________________
> From: Anthony D'Atri <a...@dreamsnake.net>
> Sent: Friday, May 23, 2025 8:05:54 PM
> To: Szabo, Istvan (Agoda) <istvan.sz...@agoda.com>
> Cc: Ceph Users <ceph-users@ceph.io>
> Subject: Re: [ceph-users] Newly added node osds getting full before rebalance 
> finished
>
> Email received from the internet. If in doubt, don't click any link nor open 
> any attachment !
> ________________________________
>
> For due diligence I have to ask:  are you running a recent Ceph release, not 
> like Firefly?  There were certain bugs back then...
>
>> On May 23, 2025, at 4:07 AM, Szabo, Istvan (Agoda) <istvan.sz...@agoda.com> 
>> wrote:
>>
>> Hi,
>>
>> For some reason this is the 2nd time I got into the same issue, added a new 
>> node at the time when in the cluster the most full osd was 73% and all the 
>> newly added osds now started to be between 70-80% full half way on the way 
>> to finish the rebalance.
>
> Did they or their hosts somehow get the wrong CRUSH weight?
>
> `ceph osd df tree` and `ceph osd tree` would give us some insights.
>
>> I don't understand why, could this be because mgr upmap mode with deviation 
>> 1?
>
> No, that’s reasonable especially if you have OSDs of multiple sizes.
>
>> Another question, I have 7 disks in each servers, however in the newly added 
>> one I still have 4 as spare which is not added. Just to get out of this 
>> space issue, would it cause any issue if I add the remaining 4 spare disk
>
> I wouldn’t think so.
>
>> (which will cause unbalanced server size in the cluster, the weight of each 
>> server is 100TB, this would cause only this node to be 160TB).
>
> Depending on your replication/EC profiles, CRUSH rules, and number of failure 
> domains that extra capacity may or may not increase the cluster’s capacity, 
> but for sure it should decrease the average fillage of the OSDs on that host, 
> unless you’re adding them in such a way that skews the host’s CRUSH weight.
>
> Sometimes with a thundering herd of backfill OSDs can temporarily increase in 
> fillage.  When you add OSDs, it’s not entirely the case that only data 
> “moving” to the new OSDs will be remapped, some existing data may be 
> reshuffled as well, though this is much less than with old releases.  Also, 
> when backfilling Ceph will write new replicas before removing displaced 
> replicas, in order to ensure safety.  Usually this doesn’t result in quite 
> the percentages you report, but upmap-remapped can help manage this dynamic.
>
>>
>> Thank you
>>
>> ________________________________
>> This message is confidential and is for the sole use of the intended 
>> recipient(s). It may also be privileged or otherwise protected by copyright 
>> or other legal rules. If you have received it by mistake please let us know 
>> by reply email and delete it from your system. It is prohibited to copy this 
>> message or disclose its content to anyone. Any confidentiality or privilege 
>> is not waived or lost by any mistaken delivery or unauthorized disclosure of 
>> the message. All messages sent to and from Agoda may be monitored to ensure 
>> compliance with company policies, to protect the company's interests and to 
>> remove potential malware. Electronic messages may be intercepted, amended, 
>> lost or deleted, or contain viruses.
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Newly added node osds getting full before rebalance finished

Reply via email to