Re: [ceph-users] Taking down one OSD node (10 OSDs) for maintenance - best practice?

David Thu, 19 Jun 2014 13:50:33 -0700

Hi,

Thanks all for answers - we actually already did this yesterday night , one OSD 
node at a time without disrupting service.
We used the noout flag and also paused deep scrub which was running with 
nodeepscrub flag during the maintenance.


Took down one node with 10 OSDs just through normal shutdown and put in CPU / 
RAM, took around 5-7 min and booted it again. When it came up it recovered the 
missing writes - then when it was done we took down the next one until we had 
finished our 5 node cluster.

There was of course a little bit of iowait on some disks due to higher latency 
during the recovery process, nothing too disruptive for our workload (since we 
mostly have high workload during daytime and did this during the night).

Kind Regards,
David Majchrzak


19 jun 2014 kl. 19:58 skrev Gregory Farnum <g...@inktank.com>:

> No, you definitely don't need to shut down the whole cluster. Just do
> a polite shutdown of the daemons, optionally with the noout flag that
> Wido mentioned.
> Software Engineer #42 @ http://inktank.com | http://ceph.com
> 
> 
> On Thu, Jun 19, 2014 at 1:55 PM, Alphe Salas Michels <asa...@kepler.cl> wrote:
>> Hello, the best practice is to simply shut down the whole cluster starting
>> form the clients,  monitors the mds and the osd. You do your maintenance
>> then you bring back everyone starting from monitors, mds, osd. clients.
>> 
>> Other while the osds missing will lead to a reconstruction of your cluster
>> that will not end with the return of the "faulty" osd(s). In the case you
>> turn off everything related to ceph cluster then it will be transparent for
>> the monitors and will not have to deal with partial reconstruction to clean
>> up and rescrubing of the returned OSD(s).
>> 
>> best regards.
>> 
>> Alphe Salas
>> T.I ingeneer.
>> 
>> 
>> 
>> On 06/13/2014 04:56 AM, David wrote:
>>> 
>>> Hi,
>>> 
>>> We’re going to take down one OSD node for maintenance (add cpu + ram)
>>> which might take 10-20 minutes.
>>> What’s the best practice here in a production cluster running dumpling
>>> 0.67.7-1~bpo70+1?
>>> 
>>> Kind Regards,
>>> David Majchrzak
>>> 
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> 
>> 
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Taking down one OSD node (10 OSDs) for maintenance - best practice?

Reply via email to