[ceph-users] Upgrading 2K OSDs from Hammer to Jewel. Our experience

cephmailinglist Sat, 11 Mar 2017 03:23:07 -0800

Hello list,

A week ago we upgraded our Ceph clusters from Hammer to Jewel and withthis email we want to share our experiences.



We have four clusters:

1) Test cluster for all the fun things, completely virtual.

2) Test cluster for Openstack: 3 monitors and 9 OSDs, all baremetal

3) Cluster where we store backups: 3 monitors and 153 OSDs. 554 TB storage

4) Main cluster (used for our custom software stack and openstack): 5monitors and 1917 OSDs. 8 PB storage

All the clusters are running on Ubuntu 14.04 LTS and we use the Cephpackages from ceph.com. On every cluster we upgraded the monitors firstand after that, the OSDs. Our backup cluster is the only cluster thatalso serves S3 via the RadosGW and that service is upgraded at the sametime as the OSDs in that cluster. The upgrade of clusters 1, 2 and 3went without any problem, just an apt-get upgrade on every component. Wedid see the message "failed to encode map e<version> with expectedcrc", but that message disappeared when all the OSDs where upgraded.

The upgrade of our biggest cluster, nr 4, did not go without problems.Since we where expecting a lot of "failed to encode map e<version> withexpected crc" messages, we disabled clog to monitors with 'ceph tellosd.* injectargs -- --clog_to_monitors=false' so our monitors would notchoke in those messages. The upgrade of the monitors did go as expected,without any problem, the problems started when we started the upgrade ofthe OSDs. In the upgrade procedure, we had to change the ownership ofthe files from root to the user ceph and that process was taking so longon our cluster that completing the upgrade would take more then a week.We decided to keep the permissions as they where for now, so in theupstart init script /etc/init/ceph-osd.conf, we changed '--setuser ceph--setgroup ceph' to '--setuser root --setgroup root' and fix that OSDby OSD after the upgrade was completely done

On cluster 3 (backup) we could change the permissions in a shorter timewith the following procedure:


    a) apt-get -y install ceph-common

b) mount|egrep 'on \/var.*ceph.*osd'|awk '{print $3}'|while read P;do echo chown -R ceph:ceph $P \&;done > t ; bash t ; rm t

    c) (wait for all the chown's to complete)
    d) stop ceph-all
    e) find /var/lib/ceph/ ! -uid 64045 -print0|xargs -0  chown ceph:ceph
    f) start ceph-all

This procedure did not work on our main (4) cluster because the load onthe OSDs became 100% in step b and that resulted in blocked I/O on somevirtual instances in the Openstack cluster. Also at that time one of ourpools got a lot of extra data, those files where stored with rootpermissions since we did not restarted the Ceph daemons yet, the 'find'in step e found so much files that xargs (the shell) could not handle it(too many arguments). At that time we decided to keep the permissions onroot in the upgrade phase.

The next and biggest problem we encountered had to do with the CRCerrors on the OSD map. On every map update, the OSDs that were notupgraded yet, got that CRC error and asked the monitor for a full OSDmap instead of just a delta update. At first we did not understand whatexactly happened, we ran the upgrade per node using a script and in thatscript we watch the state of the cluster and when the cluster is healthyagain, we upgrade the next host. Every time we started the script(skipping the already upgraded hosts) the first host(s) upgraded withoutissues and then we got blocked I/O on the cluster. The blocked I/O wentaway within a minute of 2 (not measured). After investigation we foundout that the blocked I/O happened when nodes where asking the monitorfor a (full) OSD map and that resulted shortly in a full saturatednetwork link on our monitor.

In the next graph the statistics for one of our Ceph monitor is shown.Our hosts are equipped with 10 gbit/s NIC's and every time at thehighest peaks, the problems occurred. We could work around this problemby waiting four minutes between every host and after that time (14:20)we did not have any issues any more. Of course the number of notupgraded OSDs decreased, so the number of full OSD map requests also gotsmaller in time.

The day after the upgrade we had issues with live migrations ofOpenstack instances. We got this message, "OSError:/usr/lib/librbd.so.1: undefined symbol:_ZN8librados5Rados15aio_watch_flushEPNS_13AioCompletionE". This isresolved by restarting libvirt-bin and nova-compute on every compute node.

Please notice that the upgrade of our biggest cluster was not a 100%success, but the problems where relative small and the cluster stayedon-line and there where only a few virtual openstack instances that didnot like the blocked I/O and had to be restarted.



--

With regards,

Richard Arends.
Snow BV / http://snow.nl

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Upgrading 2K OSDs from Hammer to Jewel. Our experience

Reply via email to