Success! There was an issue related to my operating system install procedure 
that was causing the journals to become corrupt, but it was not caused by ceph! 
That bug fixed; now the procedure on shutdown in this thread has been verified 
to work as expected. Thanks for all the help.

-Chris

> On Mar 1, 2017, at 9:39 AM, Peter Maloney 
> <peter.malo...@brockmann-consult.de> wrote:
> 
> On 03/01/17 15:36, Heller, Chris wrote:
>> I see. My journal is specified in ceph.conf. I'm not removing it from the 
>> OSD so sounds like flushing isn't needed in my case.
>> 
> Okay but it seems it's not right if it's saying it's a non-block journal. 
> (meaning a file, not a block device).
> 
> Double check your ceph.conf... make sure the path works, and somehow make 
> sure the [osd.x] actually matches that osd (no idea how to test that, esp. if 
> the osd doesn't start ... maybe just increase logging).
> 
> Or just make a symlink for now, just to see if it solves the problem, which 
> would imply the ceph.conf is wrong.
> 
> 
>> -Chris
>>> On Mar 1, 2017, at 9:31 AM, Peter Maloney 
>>> <peter.malo...@brockmann-consult.de 
>>> <mailto:peter.malo...@brockmann-consult.de>> wrote:
>>> 
>>> On 03/01/17 14:41, Heller, Chris wrote:
>>>> That is a good question, and I'm not sure how to answer. The journal is on 
>>>> its own volume, and is not a symlink. Also how does one flush the journal? 
>>>> That seems like an important step when bringing down a cluster safely.
>>>> 
>>> You only need to flush the journal if you are removing it from the osd, 
>>> replacing it with a different journal.
>>> 
>>> So since your journal is on its own, then you need either a symlink in the 
>>> osd directory named "journal" which points to the device (ideally not 
>>> /dev/sdx but /dev/disk/by-.../), or you put it in the ceph.conf.
>>> 
>>> And since it said you have a non-block journal now, it probably means there 
>>> is a file... you should remove that (rename it to journal.junk until you're 
>>> sure it's not an important file, and delete it later).
>>>> 
>>>>>> This is where I've stopped. All but one OSD came back online. One has 
>>>>>> this backtrace:
>>>>>> 
>>>>>> 2017-02-28 17:44:54.884235 7fb2ba3187c0 -1 journal FileJournal::_open: 
>>>>>> disabling aio for non-block journal.  Use journal_force_aio to force use 
>>>>>> of aio anyway
>>>>> Are the journals inline? or separate? If they're separate, the above 
>>>>> means the journal symlink/config is missing, so it would possibly make a 
>>>>> new journal, which would be bad if you didn't flush the old journal 
>>>>> before.
>>>>> 
>>>>> And also just one osd is easy enough to replace (which I wouldn't do 
>>>>> until the cluster settled down and recovered). So it's lame for it to be 
>>>>> broken, but it's still recoverable if that's the only issue.
>>>> 
>>> 
>>> 
>> 
> 
> 
> -- 
> 
> --------------------------------------------
> Peter Maloney
> Brockmann Consult
> Max-Planck-Str. 2
> 21502 Geesthacht
> Germany
> Tel: +49 4152 889 300
> Fax: +49 4152 889 333
> E-mail: peter.malo...@brockmann-consult.de 
> <mailto:peter.malo...@brockmann-consult.de>
> Internet: http://www.brockmann-consult.de 
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.brockmann-2Dconsult.de&d=DwMD-g&c=96ZbZZcaMF4w0F4jpN6LZg&r=ylcFa5bBSUyTQqbx1Aqz47ec5BJJc7uk0YQ4EQKh-DY&m=fXi7JtWroHrS8RV824OLTqf8NbD_NERvG8hvrPFmUAA&s=lga4HYFhA45fm1KJHyov1htPfqKhBHZsNFVkt3bTJx0&e=>
> --------------------------------------------

Attachment: smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to