Why are you rebooting the node? You should only need to restart the ceph services. You need all of your MONs to be running Luminous before any Luminous OSDs will be accepted by the cluster. So you should update the packages on each server, restart the MONs, then restart your OSDs. After you restart all of the MONs and have a Luminous quorum of MONs, then you can start restarting OSDs and/or servers.
If you want to, you can start your MGR daemons before doing the OSDs as well, but that step isn't required to have the OSDs come back up. To get out of this situation, you should update the packages on your remaining MONs and restart the MON service to get all of your MONs running Luminous. After that, your 24 down OSDs should come back up. On Fri, Dec 8, 2017 at 10:51 AM nokia ceph <nokiacephus...@gmail.com> wrote: > Hello Team, > > I having a 5 node cluster running with kraken 11.2.0 EC 4+1. > > My plan is to upgrade all 5 nodes to 12.2.2 Luminous without any downtime. > I tried on first node, below procedure. > > commented below directive from ceph.conf > enable experimental unrecoverable data corrupting features = bluestore > rocksdb > > Then start and enabled ceph-mgr and then hit a reboot. > > ## ceph -s > cluster b2f1b9b9-eecc-4c17-8b92-cfa60b31c121 > health HEALTH_WARN > 2048 pgs degraded > 2048 pgs stuck degraded > 2048 pgs stuck unclean > 2048 pgs stuck undersized > 2048 pgs undersized > recovery 1091151/1592070 objects degraded (68.537%) > 24/120 in osds are down > monmap e2: 5 mons at {PL8-CN1= > 10.50.11.41:6789/0,PL8-CN2=10.50.11.42:6789/0,PL8-CN3=10.50.11.43:6789/0,PL8-CN4=10.50.11.44:6789/0,PL8-CN5=10.50.11.45:6789/0 > } > election epoch 18, quorum 0,1,2,3,4 > PL8-CN1,PL8-CN2,PL8-CN3,PL8-CN4,PL8-CN5 > mgr active: PL8-CN1 > osdmap e243: 120 osds: 96 up, 120 in; 2048 remapped pgs > flags sortbitwise,require_jewel_osds,require_kraken_osds > pgmap v1099: 2048 pgs, 1 pools, 84304 MB data, 310 kobjects > 105 GB used, 436 TB / 436 TB avail > 1091151/1592070 objects degraded (68.537%) > 2048 active+undersized+degraded > client io 107 MB/s wr, 0 op/s rd, 860 op/s wr > > After reboot I can see that all the 24 OSD's in the first node showing > down state. I can see the 24 osd process is running. > > #ps -ef | grep -c ceph-osd > 24 > > Even If i tried parallely on 5 nodes this procedure and hit a reboot then > it will come successfully without any issues, but for parallel execution > time, I would require downtime, which is not accepted by our management at > the moment. Please help and share your views. > > I read this https://ceph.com/releases/v12-2-0-luminous-released/ upgrade > section. but this didn't help me at the moment. > > > > Here my question what is the best method to update machine without any > downtime? > > Thanks > > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com