Craig, Thanks for the info.
I ended up doing a zap and then a create via ceph-deploy. One question that I still have is surrounding adding the failed osd back into the pool. In this example...osd.70 was bad....when I added it back in via ceph-deploy...the disk was brought up as osd.108. Only after osd.108 was up and running did I think to remove osd.70 from the crush map etc. My question is this...had I removed it from the crush map prior to my ceph-deploy create...should/would Ceph have reused the osd number 70? I would prefer to replace a failed disk with a new one and keep the old osd assignment...if possible that is why I am asking. Anyway...thanks again for all the help. Shain Sent from my iPhone On Nov 7, 2014, at 2:09 PM, Craig Lewis <cle...@centraldesktop.com<mailto:cle...@centraldesktop.com>> wrote: I'd stop that osd daemon, and run xfs_check / xfs_repair on that partition. If you repair anything, you should probably force a deep-scrub on all the PGs on that disk. I think ceph osd deep-scrub <osdid> will do that, but you might have to manually grep ceph pg dump . Or you could just treat it like a failed disk, but re-use the disk. ceph-disk-prepare --zap-disk should take care of you. On Thu, Nov 6, 2014 at 5:06 PM, Shain Miley <smi...@npr.org<mailto:smi...@npr.org>> wrote: I tried restarting all the osd's on that node, osd.70 was the only ceph process that did not come back online. There is nothing in the ceph-osd log for osd.70. However I do see over 13,000 of these messages in the kern.log: Nov 6 19:54:27 hqosd6 kernel: [34042786.392178] XFS (sdl1): xfs_log_force: error 5 returned. Does anyone have any suggestions on how I might be able to get this HD back in the cluster (or whether or not it is worth even trying). Thanks, Shain Shain Miley | Manager of Systems and Infrastructure, Digital Media | smi...@npr.org<mailto:smi...@npr.org> | 202.513.3649 ________________________________________ From: Shain Miley [smi...@npr.org<mailto:smi...@npr.org>] Sent: Tuesday, November 04, 2014 3:55 PM To: ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com> Subject: osd down Hello, We are running ceph version 0.80.5 with 108 osd's. Today I noticed that one of the osd's is down: root@hqceph1:/var/log/ceph# ceph -s cluster 504b5794-34bd-44e7-a8c3-0494cf800c23 health HEALTH_WARN crush map has legacy tunables monmap e1: 3 mons at {hqceph1=10.35.1.201:6789/0,hqceph2=10.35.1.203:6789/0,hqceph3=10.35.1.205:6789/0<http://10.35.1.201:6789/0,hqceph2=10.35.1.203:6789/0,hqceph3=10.35.1.205:6789/0>}, election epoch 146, quorum 0,1,2 hqceph1,hqceph2,hqceph3 osdmap e7119: 108 osds: 107 up, 107 in pgmap v6729985: 3208 pgs, 17 pools, 81193 GB data, 21631 kobjects 216 TB used, 171 TB / 388 TB avail 3204 active+clean 4 active+clean+scrubbing client io 4079 kB/s wr, 8 op/s Using osd dump I determined that it is osd number 70: osd.70 down out weight 0 up_from 2668 up_thru 6886 down_at 6913 last_clean_interval [488,2665) 10.35.1.217:6814/22440<http://10.35.1.217:6814/22440> 10.35.1.217:6820/22440<http://10.35.1.217:6820/22440> 10.35.1.217:6824/22440<http://10.35.1.217:6824/22440> 10.35.1.217:6830/22440 autoout,exists<http://10.35.1.217:6830/22440 autoout,exists> 5dbd4a14-5045-490e-859b-15533cd67568 Looking at that node, the drive is still mounted and I did not see any errors in any of the system logs, and the raid level status shows the drive as up and healthy, etc. root@hqosd6:~# df -h |grep 70 /dev/sdl1 3.7T 1.9T 1.9T 51% /var/lib/ceph/osd/ceph-70 I was hoping that someone might be able to advise me on the next course of action (can I add the osd back in?, should I replace the drive altogether, etc) I have attached the osd log to this email. Any suggestions would be great. Thanks, Shain -- Shain Miley | Manager of Systems and Infrastructure, Digital Media | smi...@npr.org<mailto:smi...@npr.org> | 202.513.3649 _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com