> -----Original Message-----
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
> David Z
> Sent: Wednesday, November 12, 2014 8:16 AM
> To: Ceph Community; Ceph-users
> Subject: [ceph-users] The strategy of auto-restarting crashed OSD
> 
> Hi Guys,
> 
> We are experiencing some OSD crashing issues recently, like messenger
> crash, some strange crash (still being investigating), etc. Those crashes 
> seems
> not to reproduce after restarting OSD.
>
> So we are thinking about the strategy of auto-restarting crashed OSD for 1 or
> 2 times, then leave it as down if restarting doesn't work. This strategy might
> help us on pg peering and recovering impact to online traffic to some extent,
> since we won't mark OSD out automatically even if it is down unless we are
> sure it is disk failure.
> 
> However, we are also aware that this strategy may bring us some problems.
> Since your guys have more experience on CEPH, so we would like to hear
> some suggestions from you.
> 
> Thanks.
> 
> David Zhang
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


I'm currently looking at the same scenario of having to restart crashed OSDs. 
I'm looking towards using runit (http://smarden.org/runit/ & 
http://smarden.org/runit/useinit.html) to manage the OSD's...I'll probably 
modify my init script to send me a trap or email when it's restarted though.
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to