Success! There was an issue related to my operating system install procedure that was causing the journals to become corrupt, but it was not caused by ceph! That bug fixed; now the procedure on shutdown in this thread has been verified to work as expected. Thanks for all the help.
-Chris > On Mar 1, 2017, at 9:39 AM, Peter Maloney > <peter.malo...@brockmann-consult.de> wrote: > > On 03/01/17 15:36, Heller, Chris wrote: >> I see. My journal is specified in ceph.conf. I'm not removing it from the >> OSD so sounds like flushing isn't needed in my case. >> > Okay but it seems it's not right if it's saying it's a non-block journal. > (meaning a file, not a block device). > > Double check your ceph.conf... make sure the path works, and somehow make > sure the [osd.x] actually matches that osd (no idea how to test that, esp. if > the osd doesn't start ... maybe just increase logging). > > Or just make a symlink for now, just to see if it solves the problem, which > would imply the ceph.conf is wrong. > > >> -Chris >>> On Mar 1, 2017, at 9:31 AM, Peter Maloney >>> <peter.malo...@brockmann-consult.de >>> <mailto:peter.malo...@brockmann-consult.de>> wrote: >>> >>> On 03/01/17 14:41, Heller, Chris wrote: >>>> That is a good question, and I'm not sure how to answer. The journal is on >>>> its own volume, and is not a symlink. Also how does one flush the journal? >>>> That seems like an important step when bringing down a cluster safely. >>>> >>> You only need to flush the journal if you are removing it from the osd, >>> replacing it with a different journal. >>> >>> So since your journal is on its own, then you need either a symlink in the >>> osd directory named "journal" which points to the device (ideally not >>> /dev/sdx but /dev/disk/by-.../), or you put it in the ceph.conf. >>> >>> And since it said you have a non-block journal now, it probably means there >>> is a file... you should remove that (rename it to journal.junk until you're >>> sure it's not an important file, and delete it later). >>>> >>>>>> This is where I've stopped. All but one OSD came back online. One has >>>>>> this backtrace: >>>>>> >>>>>> 2017-02-28 17:44:54.884235 7fb2ba3187c0 -1 journal FileJournal::_open: >>>>>> disabling aio for non-block journal. Use journal_force_aio to force use >>>>>> of aio anyway >>>>> Are the journals inline? or separate? If they're separate, the above >>>>> means the journal symlink/config is missing, so it would possibly make a >>>>> new journal, which would be bad if you didn't flush the old journal >>>>> before. >>>>> >>>>> And also just one osd is easy enough to replace (which I wouldn't do >>>>> until the cluster settled down and recovered). So it's lame for it to be >>>>> broken, but it's still recoverable if that's the only issue. >>>> >>> >>> >> > > > -- > > -------------------------------------------- > Peter Maloney > Brockmann Consult > Max-Planck-Str. 2 > 21502 Geesthacht > Germany > Tel: +49 4152 889 300 > Fax: +49 4152 889 333 > E-mail: peter.malo...@brockmann-consult.de > <mailto:peter.malo...@brockmann-consult.de> > Internet: http://www.brockmann-consult.de > <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.brockmann-2Dconsult.de&d=DwMD-g&c=96ZbZZcaMF4w0F4jpN6LZg&r=ylcFa5bBSUyTQqbx1Aqz47ec5BJJc7uk0YQ4EQKh-DY&m=fXi7JtWroHrS8RV824OLTqf8NbD_NERvG8hvrPFmUAA&s=lga4HYFhA45fm1KJHyov1htPfqKhBHZsNFVkt3bTJx0&e=> > --------------------------------------------
smime.p7s
Description: S/MIME cryptographic signature
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com