Hi all,

a hopefully simple question this time. I would like a second opinion on a 
procedure for replacing a larger number of disks.

We need to replace about 40 disks distributed over all 12 hosts backing a large 
pool with EC 8+3. We can't do it host by host as it would take way too long 
(replace disks per host and let recovery rebuild the data). Therefore, we would 
like to evacuate all data from these disks simultaneously and with as little 
data movement as possible. This is the procedure that seems to do the trick:

1.) For all OSDs: ceph osd reweight ID 0  # Note: not "osd crush reweight"
2.) Wait for rebalance to finish
3.) Replace disks and deploy OSDs with the same IDs as before per host
4.) Start OSDs and let rebalance back

I tested step 1 on Octopus with 1 disk and it seems to work. The reason I ask 
is that step 1 actually marks the OSDs as OUT. However, they are still UP and I 
see only misplaced objects, not degraded objects. It is a bit 
counter-intuitive, but it seems that UP+OUT OSDs still participate in IO.

Because it is counter-intuitive, I would like to have a second opinion. I have 
read before that others reweight to something like 0.001 and hope that this 
flushes all PGs. I would prefer not to rely on hope and a reweight to 0 
apparently is a valid choice here, leading to a somewhat weird state with 
UP+OUT OSDs.

Problems that could arise are timeouts I'm overlooking that will make data 
chunks on UP+OUT OSDs unavailable after some time. I'm also wondering if UP+OUT 
OSDs participate in peering in case there is an OSD restart somewhere in the 
pool.

Thanks for your input and best regards!
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to