Happy to report I got everything up to Luminous, used your tip to keep the
OSDs running, David, thanks again for that.

I'd say this is a potential gotcha for people collocating MONs. It appears
that if you're running selinux, even in permissive mode, upgrading the
ceph-selinux packages forces a restart on all the OSDs. You're left with a
load of OSDs down that you can't start as you don't have a Luminous mon
quorum yet.


On 15 Sep 2017 4:54 p.m., "David" <dclistsli...@gmail.com> wrote:

Hi David

I like your thinking! Thanks for the suggestion. I've got a maintenance
window later to finish the update so will give it a try.


On Thu, Sep 14, 2017 at 6:24 PM, David Turner <drakonst...@gmail.com> wrote:

> This isn't a great solution, but something you could try.  If you stop all
> of the daemons via systemd and start them all in a screen as a manually
> running daemon in the foreground of each screen... I don't think that yum
> updating the packages can stop or start the daemons.  You could copy and
> paste the running command (viewable in ps) to know exactly what to run in
> the screens to start the daemons like this.
>
> On Wed, Sep 13, 2017 at 6:53 PM David <dclistsli...@gmail.com> wrote:
>
>> Hi All
>>
>> I did a Jewel -> Luminous upgrade on my dev cluster and it went very
>> smoothly.
>>
>> I've attempted to upgrade on a small production cluster but I've hit a
>> snag.
>>
>> After installing the ceph 12.2.0 packages with "yum install ceph" on the
>> first node and accepting all the dependencies, I found that all the OSD
>> daemons, the MON and the MDS running on that node were terminated. Systemd
>> appears to have attempted to restart them all but the daemons didn't start
>> successfully (not surprising as first stage of upgrading all mons in
>> cluster not completed). I was able to start the MON and it's running. The
>> OSDs are all down and I'm reluctant to attempt to start them without
>> upgrading the other MONs in the cluster. I'm also reluctant to attempt
>> upgrading the remaining 2 MONs without understanding what happened.
>>
>> The cluster is on Jewel 10.2.5 (as was the dev cluster)
>> Both clusters running on CentOS 7.3
>>
>> The only obvious difference I can see between the dev and production is
>> the production has selinux running in permissive mode, the dev had it
>> disabled.
>>
>> Any advice on how to proceed at this point would be much appreciated. The
>> cluster is currently functional, but I have 1 node out 4 with all OSDs
>> down. I had noout set before the upgrade and I've left it set for now.
>>
>> Here's the journalctl right after the packages were installed (hostname
>> changed):
>>
>> https://pastebin.com/fa6NMyjG
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to