Re: [ceph-users] osds down after upgrade hammer to jewel

Brian Andrus Tue, 28 Mar 2017 06:53:53 -0700

Well, you said you were running v0.94.9, but are there any OSDs running
pre-v0.94.4 as the error states?


On Tue, Mar 28, 2017 at 6:51 AM, Jaime Ibar <ja...@tchpc.tcd.ie> wrote:

>
>
> On 28/03/17 14:41, Brian Andrus wrote:
>
> What does
> # ceph tell osd.* version
>
> ceph tell osd.21 version
> Error ENXIO: problem getting command descriptions from osd.21
>
>
> reveal? Any pre-v0.94.4 hammer OSDs running as the error states?
>
> Yes, as this is the first one I tried to upgrade.
> The other ones are running hammer
>
> Thanks
>
>
>
> On Tue, Mar 28, 2017 at 1:21 AM, Jaime Ibar <ja...@tchpc.tcd.ie> wrote:
>
>> Hi,
>>
>> I did change the ownership to user ceph. In fact, OSD processes are
>> running
>>
>> ps aux | grep ceph
>> ceph        2199  0.0  2.7 1729044 918792 ?      Ssl  Mar27   0:21
>> /usr/bin/ceph-osd --cluster=ceph -i 42 -f --setuser ceph --setgroup ceph
>> ceph        2200  0.0  2.7 1721212 911084 ?      Ssl  Mar27   0:20
>> /usr/bin/ceph-osd --cluster=ceph -i 18 -f --setuser ceph --setgroup ceph
>> ceph        2212  0.0  2.8 1732532 926580 ?      Ssl  Mar27   0:20
>> /usr/bin/ceph-osd --cluster=ceph -i 3 -f --setuser ceph --setgroup ceph
>> ceph        2215  0.0  2.8 1743552 935296 ?      Ssl  Mar27   0:20
>> /usr/bin/ceph-osd --cluster=ceph -i 47 -f --setuser ceph --setgroup ceph
>> ceph        2341  0.0  2.7 1715548 908312 ?      Ssl  Mar27   0:20
>> /usr/bin/ceph-osd --cluster=ceph -i 51 -f --setuser ceph --setgroup ceph
>> ceph        2383  0.0  2.7 1694944 893768 ?      Ssl  Mar27   0:20
>> /usr/bin/ceph-osd --cluster=ceph -i 56 -f --setuser ceph --setgroup ceph
>> [...]
>>
>> If I run one of the osd increasing debug
>>
>> ceph-osd --debug_osd 5 -i 31
>>
>> this is what I get in logs
>>
>> [...]
>>
>> 0 osd.31 14016 done with init, starting boot process
>> 2017-03-28 09:19:15.280182 7f083df0c800  1 osd.31 14016 We are healthy,
>> booting
>> 2017-03-28 09:19:15.280685 7f081cad3700  1 osd.31 14016 osdmap indicates
>> one or more pre-v0.94.4 hammer OSDs is running
>> [...]
>>
>> It seems the osd is running but ceph is not aware of it
>>
>> Thanks
>> Jaime
>>
>>
>>
>>
>> On 27/03/17 21:56, George Mihaiescu wrote:
>>
>>> Make sure the OSD processes on the Jewel node are running. If you didn't
>>> change the ownership to user ceph, they won't start.
>>>
>>>
>>> On Mar 27, 2017, at 11:53, Jaime Ibar <ja...@tchpc.tcd.ie> wrote:
>>>>
>>>> Hi all,
>>>>
>>>> I'm upgrading ceph cluster from Hammer 0.94.9 to jewel 10.2.6.
>>>>
>>>> The ceph cluster has 3 servers (one mon and one mds each) and another 6
>>>> servers with
>>>> 12 osds each.
>>>> The monitoring and mds have been succesfully upgraded to latest jewel
>>>> release, however
>>>> after upgrade the first osd server(12 osds), ceph is not aware of them
>>>> and
>>>> are marked as down
>>>>
>>>> ceph -s
>>>>
>>>> cluster 4a158d27-f750-41d5-9e7f-26ce4c9d2d45
>>>>      health HEALTH_WARN
>>>> [...]
>>>>             12/72 in osds are down
>>>>             noout flag(s) set
>>>>      osdmap e14010: 72 osds: 60 up, 72 in; 14641 remapped pgs
>>>>             flags noout
>>>> [...]
>>>>
>>>> ceph osd tree
>>>>
>>>> 3   3.64000         osd.3          down  1.00000 1.00000
>>>> 8   3.64000         osd.8          down  1.00000 1.00000
>>>> 14   3.64000         osd.14         down  1.00000 1.00000
>>>> 18   3.64000         osd.18         down  1.00000          1.00000
>>>> 21   3.64000         osd.21         down  1.00000          1.00000
>>>> 28   3.64000         osd.28         down  1.00000          1.00000
>>>> 31   3.64000         osd.31         down  1.00000          1.00000
>>>> 37   3.64000         osd.37         down  1.00000          1.00000
>>>> 42   3.64000         osd.42         down  1.00000          1.00000
>>>> 47   3.64000         osd.47         down  1.00000          1.00000
>>>> 51   3.64000         osd.51         down  1.00000          1.00000
>>>> 56   3.64000         osd.56         down  1.00000          1.00000
>>>>
>>>> If I run this command with one of the down osd
>>>> ceph osd in 14
>>>> osd.14 is already in.
>>>> however ceph doesn't mark it as up and the cluster health remains
>>>> in degraded state.
>>>>
>>>> Do I have to upgrade all the osds to jewel first?
>>>> Any help as I'm running out of ideas?
>>>>
>>>> Thanks
>>>> Jaime
>>>>
>>>> --
>>>>
>>>> Jaime Ibar
>>>> High Performance & Research Computing, IS Services
>>>> Lloyd Building, Trinity College Dublin, Dublin 2, Ireland.
>>>> http://www.tchpc.tcd.ie/ | ja...@tchpc.tcd.ie
>>>> Tel: +353-1-896-3725
>>>>
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> ceph-users@lists.ceph.com
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>
>>>
>> --
>>
>> Jaime Ibar
>> High Performance & Research Computing, IS Services
>> Lloyd Building, Trinity College Dublin, Dublin 2, Ireland.
>> http://www.tchpc.tcd.ie/ | ja...@tchpc.tcd.ie
>> Tel: +353-1-896-3725
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
>
>
> --
> Brian Andrus | Cloud Systems Engineer | DreamHost
> brian.and...@dreamhost.com | www.dreamhost.com
>
>
> --
>
> Jaime Ibar
> High Performance & Research Computing, IS Services
> Lloyd Building, Trinity College Dublin, Dublin 2, 
> Ireland.http://www.tchpc.tcd.ie/ | ja...@tchpc.tcd.ie
> Tel: +353-1-896-3725 <+353%201%20896%203725>
>
>


-- 
Brian Andrus | Cloud Systems Engineer | DreamHost
brian.and...@dreamhost.com | www.dreamhost.com

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] osds down after upgrade hammer to jewel

Reply via email to