On 5/29/14 01:09 , Felix Lee wrote:
Dear experts,
Recently, a disk for one of our OSDs was failure and caused osd down,
after I recovered the disk and filesystem, I noticed two problems:
1. journal corruption, which causes osd failure from starting:
2. I guess I may use ceph-osd with "--mkjournal" option to fix journal
corruption issue, but there is another thing that bothers me, which
is, the previous osd daemon is staying in "D" state, so, it can't be
terminated, but usually, when filesystem recovered, process should be
able to leave D state, so, I am not sure what causes this and if I can
ignore that without causing any bad consequence.
In any case, it would be very grateful if you experts could shed me
some light.
Our current ceph version is ceph-0.72.2-0.el6.x86_64
And, the filesystem backend is xfs with fiber direct attached storages.
I can't speak to the specific errors you're seeing, but it looks like
you have a failing or corrupted disk.
Things I would investigate:
1. Is the disk itself failing? If this were a SATA disk, I'd check the
SMART stats on the disk. I haven't dealt with Fiber Channel disks
since before SMART was standardized, so I can't tell you do do that.
2. Get rid of the old ceph-osd process. Reboot the node if you have
to. If things come up cleanly, then you're done.
3. Fsck the filesystem. If the FS is clean, then you probably
corrupted the OSD journal.
4. How quickly do you need this fixed? At this point, I'm out of
suggestions, so I'd remove the osd, zap it, and add it back in. If
you can wait, somebody might have a better suggestion.
Fiber Channel hardware is much more complicated that SATA and SAS.
There are a lot more parts involved, which leaves more room for bugs.
If you see this problem come back on the same disk, I'd replace the
disk. If you see this happen again to other disks, I would get your
Fiber Channel vendor involved. It wouldn't hurt to make sure you have
the latest firmware on the disks, enclosure, and FC adapter.
--
*Craig Lewis*
Senior Systems Engineer
Office +1.714.602.1309
Email cle...@centraldesktop.com <mailto:cle...@centraldesktop.com>
*Central Desktop. Work together in ways you never thought possible.*
Connect with us Website <http://www.centraldesktop.com/> | Twitter
<http://www.twitter.com/centraldesktop> | Facebook
<http://www.facebook.com/CentralDesktop> | LinkedIn
<http://www.linkedin.com/groups?gid=147417> | Blog
<http://cdblog.centraldesktop.com/>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com