Looks like the journal SSD is broken. If it's still readable but not writable, then you can run
ceph-osd --id ... --flush-journal and replace the disk after doing so. You can then just point the sym links in /var/lib/ceph/osd/ceph-*/journal to the new journal and run ceph-osd --id ... --mkjournal If the journal is no longer readable: the safe variant is to completely re-create the OSDs after replacing the journal disk. (The unsafe way to go is to just skip the --flush-journal part, not recommended) Paul -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH Freseniusstr. 31h 81247 München www.croit.io Tel: +49 89 1896585 90 On Mon, Sep 30, 2019 at 3:51 AM 展荣臻(信泰) <zhanrzh...@teamsun.com.cn> wrote: > > > > > > > > Hi,all > > > we use openstack + ceph(hammer) in my production > > > > Hammer is soooooo 2015. > > > > > There are 22 osds on a host and 11 osds share one ssd for osd journal. > > > > I can’t imagine a scenario in which this strategy makes sense, the > > documentation and books are quite clear on why this is a bad idea. > > Assuming that your OSDs are HDD and the journal devices are SATA SSD, the > > journals are going to be a bottleneck, and you’re going to wear through > > them quickly. If you have a read-mostly workload, colocating them would be > > safer. > > Oh, i am wrong,we use sas ssd. > > > I also suspect that something is amiss with your CRUSH topology that is > > preventing recovery, and/or you actually have multiple overlapping failures. > > > > My crushmap is at https://github.com/rongzhen-zhan/myfile/blob/master/crushmap > > _______________________________________________ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io