Stopped for 1.5 days already, it went down the usage from 88% to 62 or less on that node osds, however the missplaced objects went up from 5% to almost 7% most probably due to newly created stuff.
Here is some details, the newly added node is the serverosd-2015: https://gist.githubusercontent.com/Badb0yBadb0y/8addf9f3df0406abc3397dd9e7f1aeca/raw/8dc686576f9dcea12542996fd6bacbac1050a89c/gistfile1.txt ________________________________ From: Anthony D'Atri <anthony.da...@gmail.com> Sent: Saturday, May 24, 2025 5:44 AM To: Szabo, Istvan (Agoda) <istvan.sz...@agoda.com> Cc: Anthony D'Atri <a...@dreamsnake.net>; Ceph Users <ceph-users@ceph.io> Subject: Re: [ceph-users] Newly added node osds getting full before rebalance finished Email received from the internet. If in doubt, don't click any link nor open any attachment ! ________________________________ Please do send links to crush dump, Ceph OSD tree, Ceph OSD df > On May 23, 2025, at 10:53 AM, Szabo, Istvan (Agoda) <istvan.sz...@agoda.com> > wrote: > > Octopus yeah, seems like want to push to the new node bigger pgs or more > data, I wonder if I add more disks to this node the weight (not crush weight) > of the node would be "heavier" temporarily maybe... not sure what would be > the best way to move forward, now I stopped the rebalance. > > Get Outlook for Android<https://aka.ms/AAb9ysg> > ________________________________ > From: Anthony D'Atri <a...@dreamsnake.net> > Sent: Friday, May 23, 2025 8:55:31 PM > To: Szabo, Istvan (Agoda) <istvan.sz...@agoda.com> > Cc: Ceph Users <ceph-users@ceph.io> > Subject: Re: [ceph-users] Newly added node osds getting full before rebalance > finished > > Email received from the internet. If in doubt, don't click any link nor open > any attachment ! > ________________________________ > Octopus? Those bugs were fixed by Hammer IIRC. > > On May 23, 2025, at 9:45 AM, Szabo, Istvan (Agoda) <istvan.sz...@agoda.com> > wrote: > > sorry, typo, 15.2.17 > > Get Outlook for Android<https://aka.ms/AAb9ysg> > ________________________________ > From: Szabo, Istvan (Agoda) <istvan.sz...@agoda.com> > Sent: Friday, May 23, 2025 8:43:44 PM > To: Anthony D'Atri <a...@dreamsnake.net> > Cc: Ceph Users <ceph-users@ceph.io> > Subject: Re: [ceph-users] Newly added node osds getting full before rebalance > finished > > 17.2.8 > > Get Outlook for Android<https://aka.ms/AAb9ysg> > ________________________________ > From: Anthony D'Atri <a...@dreamsnake.net> > Sent: Friday, May 23, 2025 8:05:54 PM > To: Szabo, Istvan (Agoda) <istvan.sz...@agoda.com> > Cc: Ceph Users <ceph-users@ceph.io> > Subject: Re: [ceph-users] Newly added node osds getting full before rebalance > finished > > Email received from the internet. If in doubt, don't click any link nor open > any attachment ! > ________________________________ > > For due diligence I have to ask: are you running a recent Ceph release, not > like Firefly? There were certain bugs back then... > >> On May 23, 2025, at 4:07 AM, Szabo, Istvan (Agoda) <istvan.sz...@agoda.com> >> wrote: >> >> Hi, >> >> For some reason this is the 2nd time I got into the same issue, added a new >> node at the time when in the cluster the most full osd was 73% and all the >> newly added osds now started to be between 70-80% full half way on the way >> to finish the rebalance. > > Did they or their hosts somehow get the wrong CRUSH weight? > > `ceph osd df tree` and `ceph osd tree` would give us some insights. > >> I don't understand why, could this be because mgr upmap mode with deviation >> 1? > > No, that’s reasonable especially if you have OSDs of multiple sizes. > >> Another question, I have 7 disks in each servers, however in the newly added >> one I still have 4 as spare which is not added. Just to get out of this >> space issue, would it cause any issue if I add the remaining 4 spare disk > > I wouldn’t think so. > >> (which will cause unbalanced server size in the cluster, the weight of each >> server is 100TB, this would cause only this node to be 160TB). > > Depending on your replication/EC profiles, CRUSH rules, and number of failure > domains that extra capacity may or may not increase the cluster’s capacity, > but for sure it should decrease the average fillage of the OSDs on that host, > unless you’re adding them in such a way that skews the host’s CRUSH weight. > > Sometimes with a thundering herd of backfill OSDs can temporarily increase in > fillage. When you add OSDs, it’s not entirely the case that only data > “moving” to the new OSDs will be remapped, some existing data may be > reshuffled as well, though this is much less than with old releases. Also, > when backfilling Ceph will write new replicas before removing displaced > replicas, in order to ensure safety. Usually this doesn’t result in quite > the percentages you report, but upmap-remapped can help manage this dynamic. > >> >> Thank you >> >> ________________________________ >> This message is confidential and is for the sole use of the intended >> recipient(s). It may also be privileged or otherwise protected by copyright >> or other legal rules. If you have received it by mistake please let us know >> by reply email and delete it from your system. It is prohibited to copy this >> message or disclose its content to anyone. Any confidentiality or privilege >> is not waived or lost by any mistaken delivery or unauthorized disclosure of >> the message. All messages sent to and from Agoda may be monitored to ensure >> compliance with company policies, to protect the company's interests and to >> remove potential malware. Electronic messages may be intercepted, amended, >> lost or deleted, or contain viruses. >> _______________________________________________ >> ceph-users mailing list -- ceph-users@ceph.io >> To unsubscribe send an email to ceph-users-le...@ceph.io > > > _______________________________________________ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io