Hi everybody,

apparently, I forgot to report back. The evacuation completed without problems 
and we are replacing disks at the moment. This procedure worked like a charm 
(please read the thread to see why we didn't just shut down OSDs and used 
recovery for rebuild):

1.) For all OSDs: ceph osd out ID # just set them out, this is sticky and does 
what you want
2.) Wait for rebalance to finish
3.) Replace disks.
4.) Deploy OSDs with the same IDs as before per host.
5.) Start OSDs and let rebalance back.

During the evacuation you might want to consider setting "osd_delete_sleep" to 
a high value to avoid issues due to PG removal reported in this thread; see 
messages by Joshua Baergen in this thread.

The only wish I have is that after setting the OSDs "out" it would be great to 
have an option to have recovery kick in in addition to speed up movement of 
data. Instead of just reading shard by shard from the out-OSDs, shards should 
also be reconstructed by recovery from all other OSDs. Our evacuation lasted 
for about 2 weeks. If recovery would kick in, this time would go down to 2-3 
days.

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Szabo, Istvan (Agoda) <istvan.sz...@agoda.com>
Sent: Monday, October 28, 2024 4:41 AM
To: Frank Schilder
Subject: Re: [ceph-users] Re: Procedure for temporary evacuation and replacement

Hi Frank,

Finally what was the best way to do this evacuation replacement?
I want to destroy all my osds node by node in my cluster due to high 
fragmentation so might follow your method.

Thank you
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to