I might suggest after ensuring what I write before : Upmap-remapped 2-3 times until the misplaced objects / PGs mostly evaporate. Set the max deviation and misplaced ratio to 1, turn on the balancer.
You might need to raise the backfillfull and full ratios a couple percent temporarily until the logjam subsides. > On May 24, 2025, at 10:40 PM, Szabo, Istvan (Agoda) <istvan.sz...@agoda.com> > wrote: > > > Stopped for 1.5 days already, it went down the usage from 88% to 62 or less > on that node osds, however the missplaced objects went up from 5% to almost > 7% most probably due to newly created stuff. > > Here is some details, the newly added node is the serverosd-2015: > https://gist.githubusercontent.com/Badb0yBadb0y/8addf9f3df0406abc3397dd9e7f1aeca/raw/8dc686576f9dcea12542996fd6bacbac1050a89c/gistfile1.txt > > From: Anthony D'Atri <anthony.da...@gmail.com> > Sent: Saturday, May 24, 2025 5:44 AM > To: Szabo, Istvan (Agoda) <istvan.sz...@agoda.com> > Cc: Anthony D'Atri <a...@dreamsnake.net>; Ceph Users <ceph-users@ceph.io> > Subject: Re: [ceph-users] Newly added node osds getting full before rebalance > finished > > Email received from the internet. If in doubt, don't click any link nor open > any attachment ! > ________________________________ > > Please do send links to crush dump, Ceph OSD tree, Ceph OSD df > > > On May 23, 2025, at 10:53 AM, Szabo, Istvan (Agoda) > > <istvan.sz...@agoda.com> wrote: > > > > Octopus yeah, seems like want to push to the new node bigger pgs or more > > data, I wonder if I add more disks to this node the weight (not crush > > weight) of the node would be "heavier" temporarily maybe... not sure what > > would be the best way to move forward, now I stopped the rebalance. > > > > Get Outlook for Android<https://aka.ms/AAb9ysg> > > ________________________________ > > From: Anthony D'Atri <a...@dreamsnake.net> > > Sent: Friday, May 23, 2025 8:55:31 PM > > To: Szabo, Istvan (Agoda) <istvan.sz...@agoda.com> > > Cc: Ceph Users <ceph-users@ceph.io> > > Subject: Re: [ceph-users] Newly added node osds getting full before > > rebalance finished > > > > Email received from the internet. If in doubt, don't click any link nor > > open any attachment ! > > ________________________________ > > Octopus? Those bugs were fixed by Hammer IIRC. > > > > On May 23, 2025, at 9:45 AM, Szabo, Istvan (Agoda) <istvan.sz...@agoda.com> > > wrote: > > > > sorry, typo, 15.2.17 > > > > Get Outlook for Android<https://aka.ms/AAb9ysg> > > ________________________________ > > From: Szabo, Istvan (Agoda) <istvan.sz...@agoda.com> > > Sent: Friday, May 23, 2025 8:43:44 PM > > To: Anthony D'Atri <a...@dreamsnake.net> > > Cc: Ceph Users <ceph-users@ceph.io> > > Subject: Re: [ceph-users] Newly added node osds getting full before > > rebalance finished > > > > 17.2.8 > > > > Get Outlook for Android<https://aka.ms/AAb9ysg> > > ________________________________ > > From: Anthony D'Atri <a...@dreamsnake.net> > > Sent: Friday, May 23, 2025 8:05:54 PM > > To: Szabo, Istvan (Agoda) <istvan.sz...@agoda.com> > > Cc: Ceph Users <ceph-users@ceph.io> > > Subject: Re: [ceph-users] Newly added node osds getting full before > > rebalance finished > > > > Email received from the internet. If in doubt, don't click any link nor > > open any attachment ! > > ________________________________ > > > > For due diligence I have to ask: are you running a recent Ceph release, > > not like Firefly? There were certain bugs back then... > > > >> On May 23, 2025, at 4:07 AM, Szabo, Istvan (Agoda) > >> <istvan.sz...@agoda.com> wrote: > >> > >> Hi, > >> > >> For some reason this is the 2nd time I got into the same issue, added a > >> new node at the time when in the cluster the most full osd was 73% and all > >> the newly added osds now started to be between 70-80% full half way on the > >> way to finish the rebalance. > > > > Did they or their hosts somehow get the wrong CRUSH weight? > > > > `ceph osd df tree` and `ceph osd tree` would give us some insights. > > > >> I don't understand why, could this be because mgr upmap mode with > >> deviation 1? > > > > No, that’s reasonable especially if you have OSDs of multiple sizes. > > > >> Another question, I have 7 disks in each servers, however in the newly > >> added one I still have 4 as spare which is not added. Just to get out of > >> this space issue, would it cause any issue if I add the remaining 4 spare > >> disk > > > > I wouldn’t think so. > > > >> (which will cause unbalanced server size in the cluster, the weight of > >> each server is 100TB, this would cause only this node to be 160TB). > > > > Depending on your replication/EC profiles, CRUSH rules, and number of > > failure domains that extra capacity may or may not increase the cluster’s > > capacity, but for sure it should decrease the average fillage of the OSDs > > on that host, unless you’re adding them in such a way that skews the host’s > > CRUSH weight. > > > > Sometimes with a thundering herd of backfill OSDs can temporarily increase > > in fillage. When you add OSDs, it’s not entirely the case that only data > > “moving” to the new OSDs will be remapped, some existing data may be > > reshuffled as well, though this is much less than with old releases. Also, > > when backfilling Ceph will write new replicas before removing displaced > > replicas, in order to ensure safety. Usually this doesn’t result in quite > > the percentages you report, but upmap-remapped can help manage this dynamic. > > > >> > >> Thank you > >> > >> ________________________________ > >> This message is confidential and is for the sole use of the intended > >> recipient(s). It may also be privileged or otherwise protected by > >> copyright or other legal rules. If you have received it by mistake please > >> let us know by reply email and delete it from your system. It is > >> prohibited to copy this message or disclose its content to anyone. Any > >> confidentiality or privilege is not waived or lost by any mistaken > >> delivery or unauthorized disclosure of the message. All messages sent to > >> and from Agoda may be monitored to ensure compliance with company > >> policies, to protect the company's interests and to remove potential > >> malware. Electronic messages may be intercepted, amended, lost or deleted, > >> or contain viruses. > >> _______________________________________________ > >> ceph-users mailing list -- ceph-users@ceph.io > >> To unsubscribe send an email to ceph-users-le...@ceph.io > > > > > > _______________________________________________ > > ceph-users mailing list -- ceph-users@ceph.io > > To unsubscribe send an email to ceph-users-le...@ceph.io _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io