Hi Andrei, Can you share your udev hack that you had to use?
Currently, i add "/usr/sbin/ceph-disk activate-all” to /etc/rc.local to activate all OSDs at boot. After the first reboot after upgrading to jewel, the journal disks are owned by ceph:ceph. Also, links are created in /etc/systemd/system/ceph-osd.target.wants/. I can now use “systemctl (start|stop) ceph.target to stop and start the OSDs. Unfortunately, when i disable the “ceph-disk activate-all” rule in rc.local and reboot again, the OSDs are not started. This, of course, is caused by the fact that the OSDs are not mounted at boot. As i understand, your udev hack script should to this. Ernst > On 23 mei 2016, at 12:26, Andrei Mikhailovsky <and...@arhont.com> wrote: > > Hello > > I've recently updated my Hammer ceph cluster running on Ubuntu 14.04 LTS > servers and noticed a few issues during the upgrade. Just wanted to share my > experience. > > I've installed the latest Jewel release. In my opinion, some of the issues I > came across relate to poor upgrade documentation instructions, others to > inconsistencies in the ubuntu package. Here are the issues i've picked up > (I've followed the release notes upgrade procedure): > > > 1. Ceph journals - After performing the upgrade the ceph-osd processes are > not starting. I've followed the instructions and chowned /var/lib/ceph (also > see point 2 below). The issue relates to the journal partitions, which are > not chowned due to the symlinks. Thus, the ceph user had no read/write access > to the journal partitions. IMHO, this should be addressed at the > documentation layer unless it can be easily and reliably dealt with by the > installation script. > > > > 2. Inefficient chown documentation - The documentation states that one > should "chown -R ceph:ceph /var/lib/ceph" if one is looking to have ceph-osd > ran as user ceph and not as root. Now, this command would run a chown process > one osd at a time. I am considering my cluster to be a fairly small cluster > with just 30 osds between 3 osd servers. It takes about 60 minutes to run the > chown command on each osd (3TB disks with about 60% usage). It would take > about 10 hours to complete this command on each osd server, which is just mad > in my opinion. I can't imagine this working well at all on servers with 20-30 > osds! IMHO the docs should be adjusted to instruct users to run the chown in > _parallel_ on all osds instead of doing it one by one. > > In addition, the documentation does not mention the issues with journals, > which I think is a big miss. In the end, I had to hack a quick udev rule to > address this at the boot time, as my journal ssds were still owned by > root:disk after a reboot. > > > > 3. Radosgw service - After the upgrade, the radosgw service was still > starting as user root. Also, using the start/stop/restart scripts that came > with the package simply do not start the service at all. For example, start > radosgw or start radosgw-all-started does not start the service. I had to use > the old startup script /etc/init.d/radosgw in order to start the service, but > the service is started as user root and not ceph as intended in Jewel. > > > Overall, after sorting out most of the issues, the cluster is running okay > for 2 days now. The radosgw issue still need looking at though. > > > Cheers > > Andrei > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com