Thank you for the advice.

Our crush map is actually setup with replication set to 3, and at least one 
copy in each cabinet, ensuring no one host is a single point of failure. We 
fully intended on performing this maintenance over the course of many week, one 
host at a time. We felt that the staggered deploy times for the SSDs, based on 
their unique failure nature, was a benefit anyway. (i.e. When one goes, all of 
its friends are usually close behind)

Cheers,
-
Stephen Mercier | Sr. Systems Architect
Attainia Capital Planning Solutions (ACPS)
O: (650)241-0567, 727 | TF: (866)288-2464, 727
stephen.merc...@attainia.com | www.attainia.com

On Apr 14, 2016, at 7:00 AM, Wido den Hollander wrote:

> 
>> Op 14 april 2016 om 15:29 schreef Stephen Mercier
>> <stephen.merc...@attainia.com>:
>> 
>> 
>> Good morning,
>> 
>> We've been running a medium-sized (88 OSDs - all SSD) ceph cluster for the
>> past 20 months. We're very happy with our experience with the platform so 
>> far.
>> 
>> Shortly, we will be embarking on an initiative to replace all 88 OSDs with 
>> new
>> drives (Planned maintenance and lifecycle replacement). Before we do so,
>> however, I wanted to confirm with the community as to the proper order of
>> operation to perform such a task.
>> 
>> The OSDs are divided evenly across an even number of hosts which are then
>> divided evenly between 2 cabinets in 2 physically separate locations. The 
>> plan
>> is to replace the OSDs, one host at a time, cycling back and forth between
>> cabinets, replacing one host per week, or every 2 weeks (Depending on the
>> amount of time the crush rebalancing takes).
>> 
> 
> I assume that your replication is set to "2" and that you replicate over the 
> two
> locations?
> 
> In that case, only work on HDDs in the first location and start on the second
> one after you replaced them all.
> 
>> For each host, the plan was to mark the OSDs as out, one at a time, closely
>> monitoring each of them, moving to the next OSD one the current one is
>> balanced out. Once all OSDs are successfully marked as out, we will then
>> delete those OSDs from the cluster, shutdown the server, replace the physical
>> drives, and once rebooted, add the new drives to the cluster as new OSDs 
>> using
>> the same method we've used previously, doing so one at a time to allow for
>> rebalancing as they rejoin the cluster.
>> 
>> My questions are…Does this process sound correct? Should I also mark the OSDs
>> as down when I mark them as out? Are there any steps I'm overlooking in this
>> process?
>> 
> 
> No, marking out is just fine. That tells CRUSH the OSD is no longer
> participating in the data placement. It's effective weight will be 0 and 
> that's
> it.
> 
> Like others mention, reweight the OSD to 0 at the same time you mark it as 
> out.
> That way you prevent a double rebalance.
> 
> Keep it marked as UP so that it can help in migrating the PGs to other nodes.
> 
>> Any advice is greatly appreciated.
>> 
>> Cheers,
>> -
>> Stephen Mercier | Sr. Systems Architect
>> Attainia Capital Planning Solutions (ACPS)
>> O: (650)241-0567, 727 | TF: (866)288-2464, 727
>> stephen.merc...@attainia.com | www.attainia.com
>> 
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to