As a counterpoint, adding large amounts of new hardware in gradually (or more specifically in a few steps) has a few benefits IMO.
- Being able to pause the operation and confirm the new hardware (and cluster) is operating as expected. You can identify problems with hardware with OSDs at 10% weight that would be much harder to notice during backfilling, and could cause performance issues to the cluster if they ended up with their full complement of PGs. - Breaking up long backfills. For a full cluster with large OSDs, backfills can take weeks. I find that letting the mon stores compact, and getting the cluster back to health OK is good for my sanity and gives a good stopping point to work on other cluster issues. This obviously depends on the cluster fullness and OSD size. I still aim for the smallest amount of steps/work, but an initial crush weighting of 10-25% of final weight is a good sanity check of the new hardware, and gives a good indication of how to approach the rest of the backfill. Cheers, Tom From: ceph-users <ceph-users-boun...@lists.ceph.com> On Behalf Of Paul Emmerich Sent: 24 July 2019 20:06 To: Reed Dier <reed.d...@focusvq.com> Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] How to add 100 new OSDs... +1 on adding them all at the same time. All these methods that gradually increase the weight aren't really necessary in newer releases of Ceph. Paul -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH Freseniusstr. 31h 81247 München www.croit.io<http://www.croit.io> Tel: +49 89 1896585 90 On Wed, Jul 24, 2019 at 8:59 PM Reed Dier <reed.d...@focusvq.com<mailto:reed.d...@focusvq.com>> wrote: Just chiming in to say that this too has been my preferred method for adding [large numbers of] OSDs. Set the norebalance nobackfill flags. Create all the OSDs, and verify everything looks good. Make sure my max_backfills, recovery_max_active are as expected. Make sure everything has peered. Unset flags and let it run. One crush map change, one data movement. Reed That works, but with newer releases I've been doing this: - Make sure cluster is HEALTH_OK - Set the 'norebalance' flag (and usually nobackfill) - Add all the OSDs - Wait for the PGs to peer. I usually wait a few minutes - Remove the norebalance and nobackfill flag - Wait for HEALTH_OK Wido _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com