[ceph-users] Re: Stop Rebalancing

Ray Cunningham Wed, 13 Apr 2022 08:55:43 -0700

No repair IO and misplaced objects increasing with norebalance and nobackfill 
set.



Thank you,

Ray

________________________________
From: Ray Cunningham <ray.cunning...@keepertech.com>
Sent: Wednesday, April 13, 2022 10:38:29 AM
To: Dan van der Ster <dvand...@gmail.com>
Cc: ceph-users@ceph.io <ceph-users@ceph.io>
Subject: Re: [ceph-users] Stop Rebalancing

All pools have gone backfillfull.


Thank you,

Ray Cunningham



Systems Engineering and Services Manager

keepertechnology<http://www.keepertech.com/>

(571) 223-7242

________________________________
From: Ray Cunningham
Sent: Wednesday, April 13, 2022 10:15:56 AM
To: Dan van der Ster <dvand...@gmail.com>
Cc: ceph-users@ceph.io <ceph-users@ceph.io>
Subject: RE: [ceph-users] Stop Rebalancing

Perfect timing, I was just about to reply. We have disabled autoscaler on all 
pools now.

Unfortunately, I can't just copy and paste from this system...

`ceph osd pool ls detail` only 2 pools have any difference.
pool1:  pgnum 940, pgnum target 256, pgpnum 926 pgpnum target 256
pool7:  pgnum 2048, pgnum target 2048, pgpnum883, pgpnum target 2048

` ceph osd pool autoscale-status`
Size is defined
target size is empty
Rate is 7 for all pools except pool7, which is 1.3333333730697632
Raw capacity is defined
Ratio for pool1 is .0177, pool7 is .4200 and all others is 0
Target and Effective Ratio is empty
Bias is 1.0 for all
PG_NUM: pool1 is 256, pool7 is 2048 and all others are 32.
New PG_NUM is empty
Autoscale is now off for all
Profile is scale-up


We have set norebalance and nobackfill and are watching to see what happens.

Thank you,
Ray

-----Original Message-----
From: Dan van der Ster <dvand...@gmail.com>
Sent: Wednesday, April 13, 2022 10:00 AM
To: Ray Cunningham <ray.cunning...@keepertech.com>
Cc: ceph-users@ceph.io
Subject: Re: [ceph-users] Stop Rebalancing

One more thing, could you please also share the `ceph osd pool 
autoscale-status` ?


On Tue, Apr 12, 2022 at 9:50 PM Ray Cunningham <ray.cunning...@keepertech.com> 
wrote:
>
> Thank you Dan! I will definitely disable autoscaler on the rest of our pools. 
> I can't get the PG numbers today, but I will try to get them tomorrow. We 
> definitely want to get this under control.
>
> Thank you,
> Ray
>
>
> -----Original Message-----
> From: Dan van der Ster <dvand...@gmail.com>
> Sent: Tuesday, April 12, 2022 2:46 PM
> To: Ray Cunningham <ray.cunning...@keepertech.com>
> Cc: ceph-users@ceph.io
> Subject: Re: [ceph-users] Stop Rebalancing
>
> Hi Ray,
>
> Disabling the autoscaler on all pools is probably a good idea. At least until 
> https://tracker.ceph.com/issues/53729 is fixed. (You are likely not 
> susceptible to that -- but better safe than sorry).
>
> To pause the ongoing PG merges, you can indeed set the pg_num to the current 
> value. This will allow the ongoing merge complete and prevent further merges 
> from starting.
> From `ceph osd pool ls detail` you'll see pg_num, pgp_num, pg_num_target, 
> pgp_num_target... If you share the current values of those we can help advise 
> what you need to set the pg_num to to effectively pause things where they are.
>
> BTW -- I'm going to create a request in the tracker that we improve the pg 
> autoscaler heuristic. IMHO the autoscaler should estimate the time to carry 
> out a split/merge operation and avoid taking one-way decisions without 
> permission from the administrator. The autoscaler is meant to be helpful, not 
> degrade a cluster for 100 days!
>
> Cheers, Dan
>
>
>
> On Tue, Apr 12, 2022 at 9:04 PM Ray Cunningham 
> <ray.cunning...@keepertech.com> wrote:
> >
> > Hi Everyone,
> >
> > We just upgraded our 640 OSD cluster to Ceph 16.2.7 and the resulting 
> > rebalancing of misplaced objects is overwhelming the cluster and impacting 
> > MON DB compaction, deep scrub repairs and us upgrading legacy bluestore 
> > OSDs. We have to pause the rebalancing if misplaced objects or we're going 
> > to fall over.
> >
> > Autoscaler-status tells us that we are reducing our PGs by 700'ish which 
> > will take us over 100 days to complete at our current recovery speed. We 
> > disabled autoscaler on our biggest pool, but I'm concerned that it's 
> > already on the path to the lower PG count and won't stop adding to our 
> > misplaced count after drop below 5%. What can we do to stop the cluster 
> > from finding more misplaced objects to rebalance? Should we set the PG num 
> > manually to what our current count is? Or will that cause even more havoc?
> >
> > Any other thoughts or ideas? My goals are to stop the rebalancing 
> > temporarily so we can deep scrub and repair inconsistencies, upgrade legacy 
> > bluestore OSDs and compact our MON DBs (supposedly MON DBs don't compact 
> > when you aren't 100% active+clean).
> >
> > Thank you,
> > Ray
> >
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an
> > email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Stop Rebalancing

Reply via email to