As a counterpoint, adding large amounts of new hardware in gradually (or more 
specifically in a few steps) has a few benefits IMO.

- Being able to pause the operation and confirm the new hardware (and cluster) 
is operating as expected. You can identify problems with hardware with OSDs at 
10% weight that would be much harder to notice during backfilling, and could 
cause performance issues to the cluster if they ended up with their full 
complement of PGs.

- Breaking up long backfills. For a full cluster with large OSDs, backfills can 
take weeks. I find that letting the mon stores compact, and getting the cluster 
back to health OK is good for my sanity and gives a good stopping point to work 
on other cluster issues. This obviously depends on the cluster fullness and OSD 
size.

I still aim for the smallest amount of steps/work, but an initial crush 
weighting of 10-25% of final weight is a good sanity check of the new hardware, 
and gives a good indication of how to approach the rest of the backfill.

Cheers,
Tom

From: ceph-users <ceph-users-boun...@lists.ceph.com> On Behalf Of Paul Emmerich
Sent: 24 July 2019 20:06
To: Reed Dier <reed.d...@focusvq.com>
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] How to add 100 new OSDs...

+1 on adding them all at the same time.

All these methods that gradually increase the weight aren't really necessary in 
newer releases of Ceph.

Paul

--
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io<http://www.croit.io>
Tel: +49 89 1896585 90


On Wed, Jul 24, 2019 at 8:59 PM Reed Dier 
<reed.d...@focusvq.com<mailto:reed.d...@focusvq.com>> wrote:
Just chiming in to say that this too has been my preferred method for adding 
[large numbers of] OSDs.

Set the norebalance nobackfill flags.
Create all the OSDs, and verify everything looks good.
Make sure my max_backfills, recovery_max_active are as expected.
Make sure everything has peered.
Unset flags and let it run.

One crush map change, one data movement.

Reed



That works, but with newer releases I've been doing this:

- Make sure cluster is HEALTH_OK
- Set the 'norebalance' flag (and usually nobackfill)
- Add all the OSDs
- Wait for the PGs to peer. I usually wait a few minutes
- Remove the norebalance and nobackfill flag
- Wait for HEALTH_OK

Wido

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to