As you have noted, 'ceph osd reweight 0' is the same as an 'ceph osd out', but 
it is not the same as removing the OSD from the crush map (or setting crush 
weight to 0). This explains your observation of the double rebalance when you 
mark an OSD out (or reweight an OSD to 0), and then remove it later.

To avoid this, I use a crush reweight for the initial step to move PGs off an 
OSD when draining nodes. You can then purge the OSD with no further PG movement.

Double movement:
> ceph osd out $i
# rebalancing
> ceph osd purge $i
# more rebalancing

Single movement:
> ceph osd crush reweight $i 0
# rebalancing
> ceph osd purge $i
# no rebalancing

The reason this occurs (as I understand it) is that the reweight value is taken 
into account later in the crush calc, so an OSD with a reweight of 0 can still 
be picked for a PG set, and then the reweight kicks in and forces the calc to 
be retried, giving a different value for the PG set compared to if the OSD was 
not present, or had a crush weight of 0.

Cheers,
Tom

> -----Original Message-----
> From: Brent Kennedy <bkenn...@cfl.rr.com>
> Sent: 02 June 2020 04:44
> To: 'ceph-users' <ceph-users@ceph.io>
> Subject: [ceph-users] OSD upgrades
> 
> We are rebuilding servers and before luminous our process was:
> 
> 
> 
> 1.       Reweight the OSD to 0
> 
> 2.       Wait for rebalance to complete
> 
> 3.       Out the osd
> 
> 4.       Crush remove osd
> 
> 5.       Auth del osd
> 
> 6.       Ceph osd rm #
> 
> 
> 
> Seems the luminous documentation says that you should:
> 
> 1.       Out the osd
> 
> 2.       Wait for the cluster rebalance to finish
> 
> 3.       Stop the osd
> 
> 4.       Osd purge #
> 
> 
> 
> Is reweighting to 0 no longer suggested?
> 
> 
> 
> Side note:  I tried our existing process and even after reweight, the entire
> cluster restarted the balance again after step 4 ( crush remove osd ) of the 
> old
> process.  I should also note, by reweighting to 0, when I tried to run "ceph 
> osd
> out #", it said it was already marked out.
> 
> 
> 
> I assume the docs are correct, but just want to make sure since reweighting
> had been previously recommended.
> 
> 
> 
> Regards,
> 
> -Brent
> 
> 
> 
> Existing Clusters:
> 
> Test: Nautilus 14.2.2 with 3 osd servers, 1 mon/man, 1 gateway, 2 iscsi
> gateways ( all virtual on nvme )
> 
> US Production(HDD): Nautilus 14.2.2 with 11 osd servers, 3 mons, 4 gateways,
> 2 iscsi gateways
> 
> UK Production(HDD): Nautilus 14.2.2 with 12 osd servers, 3 mons, 4 gateways
> 
> US Production(SSD): Nautilus 14.2.2 with 6 osd servers, 3 mons, 3 gateways,
> 2 iscsi gateways
> 
> 
> 
> 
> 
> 
> 
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to
> ceph-users-le...@ceph.io

This email and any attachments are intended solely for the use of the named 
recipients. If you are not the intended recipient you must not use, disclose, 
copy or distribute this email or any of its attachments and should notify the 
sender immediately and delete this email from your system. UK Research and 
Innovation (UKRI) has taken every reasonable precaution to minimise risk of 
this email or any attachments containing viruses or malware but the recipient 
should carry out its own virus and malware checks before opening the 
attachments. UKRI does not accept any liability for any losses or damages which 
the recipient may sustain due to presence of any viruses. Opinions, conclusions 
or other information in this message and attachments that are not related 
directly to UKRI business are solely those of the author and do not represent 
the views of UKRI.
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to