Hi Everyone,

We just upgraded our 640 OSD cluster to Ceph 16.2.7 and the resulting 
rebalancing of misplaced objects is overwhelming the cluster and impacting MON 
DB compaction, deep scrub repairs and us upgrading legacy bluestore OSDs. We 
have to pause the rebalancing if misplaced objects or we're going to fall over.

Autoscaler-status tells us that we are reducing our PGs by 700'ish which will 
take us over 100 days to complete at our current recovery speed. We disabled 
autoscaler on our biggest pool, but I'm concerned that it's already on the path 
to the lower PG count and won't stop adding to our misplaced count after drop 
below 5%. What can we do to stop the cluster from finding more misplaced 
objects to rebalance? Should we set the PG num manually to what our current 
count is? Or will that cause even more havoc?

Any other thoughts or ideas? My goals are to stop the rebalancing temporarily 
so we can deep scrub and repair inconsistencies, upgrade legacy bluestore OSDs 
and compact our MON DBs (supposedly MON DBs don't compact when you aren't 100% 
active+clean).

Thank you,
Ray

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to