Re: [ceph-users] osds down after upgrade hammer to jewel

Jaime Ibar Tue, 28 Mar 2017 06:51:50 -0700


On 28/03/17 14:41, Brian Andrus wrote:

What does
# ceph tell osd.* version

ceph tell osd.21 version
Error ENXIO: problem getting command descriptions from osd.21


reveal? Any pre-v0.94.4 hammer OSDs running as the error states?

Yes, as this is the first one I tried to upgrade.
The other ones are running hammer

Thanks

On Tue, Mar 28, 2017 at 1:21 AM, Jaime Ibar <ja...@tchpc.tcd.ie<mailto:ja...@tchpc.tcd.ie>> wrote:


    Hi,

    I did change the ownership to user ceph. In fact, OSD processes
    are running

    ps aux | grep ceph
    ceph        2199  0.0  2.7 1729044 918792 ?      Ssl  Mar27  0:21
    /usr/bin/ceph-osd --cluster=ceph -i 42 -f --setuser ceph
    --setgroup ceph
    ceph        2200  0.0  2.7 1721212 911084 ?      Ssl  Mar27  0:20
    /usr/bin/ceph-osd --cluster=ceph -i 18 -f --setuser ceph
    --setgroup ceph
    ceph        2212  0.0  2.8 1732532 926580 ?      Ssl  Mar27  0:20
    /usr/bin/ceph-osd --cluster=ceph -i 3 -f --setuser ceph --setgroup
    ceph
    ceph        2215  0.0  2.8 1743552 935296 ?      Ssl  Mar27  0:20
    /usr/bin/ceph-osd --cluster=ceph -i 47 -f --setuser ceph
    --setgroup ceph
    ceph        2341  0.0  2.7 1715548 908312 ?      Ssl  Mar27  0:20
    /usr/bin/ceph-osd --cluster=ceph -i 51 -f --setuser ceph
    --setgroup ceph
    ceph        2383  0.0  2.7 1694944 893768 ?      Ssl  Mar27  0:20
    /usr/bin/ceph-osd --cluster=ceph -i 56 -f --setuser ceph
    --setgroup ceph
    [...]

    If I run one of the osd increasing debug

    ceph-osd --debug_osd 5 -i 31

    this is what I get in logs

    [...]

    0 osd.31 14016 done with init, starting boot process
    2017-03-28 09:19:15.280182 7f083df0c800  1 osd.31 14016 We are
    healthy, booting
    2017-03-28 09:19:15.280685 7f081cad3700  1 osd.31 14016 osdmap
    indicates one or more pre-v0.94.4 hammer OSDs is running
    [...]

    It seems the osd is running but ceph is not aware of it

    Thanks
    Jaime




    On 27/03/17 21:56, George Mihaiescu wrote:

        Make sure the OSD processes on the Jewel node are running. If
        you didn't change the ownership to user ceph, they won't start.


            On Mar 27, 2017, at 11:53, Jaime Ibar <ja...@tchpc.tcd.ie
            <mailto:ja...@tchpc.tcd.ie>> wrote:

            Hi all,

            I'm upgrading ceph cluster from Hammer 0.94.9 to jewel 10.2.6.

            The ceph cluster has 3 servers (one mon and one mds each)
            and another 6 servers with
            12 osds each.
            The monitoring and mds have been succesfully upgraded to
            latest jewel release, however
            after upgrade the first osd server(12 osds), ceph is not
            aware of them and
            are marked as down

            ceph -s

            cluster 4a158d27-f750-41d5-9e7f-26ce4c9d2d45
                 health HEALTH_WARN
            [...]
                        12/72 in osds are down
                        noout flag(s) set
                 osdmap e14010: 72 osds: 60 up, 72 in; 14641 remapped pgs
                        flags noout
            [...]

            ceph osd tree

            3   3.64000         osd.3          down  1.00000 1.00000
            8   3.64000         osd.8          down  1.00000 1.00000
            14   3.64000         osd.14         down  1.00000 1.00000

18 3.64000 osd.18 down 1.000001.0000021 3.64000 osd.21 down 1.000001.0000028 3.64000 osd.28 down 1.000001.0000031 3.64000 osd.31 down 1.000001.0000037 3.64000 osd.37 down 1.000001.0000042 3.64000 osd.42 down 1.000001.0000047 3.64000 osd.47 down 1.000001.0000051 3.64000 osd.51 down 1.000001.0000056 3.64000 osd.56 down 1.000001.00000


            If I run this command with one of the down osd
            ceph osd in 14
            osd.14 is already in.
            however ceph doesn't mark it as up and the cluster health
            remains
            in degraded state.

            Do I have to upgrade all the osds to jewel first?
            Any help as I'm running out of ideas?

            Thanks
            Jaime

--

            Jaime Ibar
            High Performance & Research Computing, IS Services
            Lloyd Building, Trinity College Dublin, Dublin 2, Ireland.
            http://www.tchpc.tcd.ie/ | ja...@tchpc.tcd.ie
            <mailto:ja...@tchpc.tcd.ie>
            Tel: +353-1-896-3725 <tel:%2B353-1-896-3725>

            _______________________________________________
            ceph-users mailing list
            ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
            http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
            <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>

--

    Jaime Ibar
    High Performance & Research Computing, IS Services
    Lloyd Building, Trinity College Dublin, Dublin 2, Ireland.
    http://www.tchpc.tcd.ie/ | ja...@tchpc.tcd.ie
    <mailto:ja...@tchpc.tcd.ie>
    Tel: +353-1-896-3725 <tel:%2B353-1-896-3725>

    _______________________________________________
    ceph-users mailing list
    ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
    http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
    <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>




--
Brian Andrus | Cloud Systems Engineer | DreamHost
brian.and...@dreamhost.com | www.dreamhost.com <http://www.dreamhost.com>


--

Jaime Ibar
High Performance & Research Computing, IS Services
Lloyd Building, Trinity College Dublin, Dublin 2, Ireland.
http://www.tchpc.tcd.ie/ | ja...@tchpc.tcd.ie
Tel: +353-1-896-3725

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] osds down after upgrade hammer to jewel

Reply via email to