Hi Andrei,

Can you share your udev hack that you had to use?

Currently, i add "/usr/sbin/ceph-disk activate-all” to /etc/rc.local to 
activate all OSDs at boot. After the first reboot after upgrading to jewel, the 
journal disks are owned by ceph:ceph. Also, links are created in 
/etc/systemd/system/ceph-osd.target.wants/. I can now use “systemctl 
(start|stop) ceph.target to stop and start the OSDs. Unfortunately, when i 
disable the “ceph-disk activate-all” rule in rc.local and reboot again, the 
OSDs are not started. This, of course, is caused by the fact that the OSDs are 
not mounted at boot. As i understand, your udev hack script should to this.

Ernst


> On 23 mei 2016, at 12:26, Andrei Mikhailovsky <and...@arhont.com> wrote:
> 
> Hello
> 
> I've recently updated my Hammer ceph cluster running on Ubuntu 14.04 LTS 
> servers and noticed a few issues during the upgrade. Just wanted to share my 
> experience.
> 
> I've installed the latest Jewel release. In my opinion, some of the issues I 
> came across relate to poor upgrade documentation instructions, others to 
> inconsistencies in the ubuntu package. Here are the issues i've picked up 
> (I've followed the release notes upgrade procedure):
> 
> 
> 1. Ceph journals - After performing the upgrade the ceph-osd processes are 
> not starting. I've followed the instructions and chowned /var/lib/ceph (also 
> see point 2 below). The issue relates to the journal partitions, which are 
> not chowned due to the symlinks. Thus, the ceph user had no read/write access 
> to the journal partitions. IMHO, this should be addressed at the 
> documentation layer unless it can be easily and reliably dealt with by the 
> installation script.
> 
> 
> 
> 2. Inefficient chown documentation -  The documentation states that one 
> should "chown -R ceph:ceph /var/lib/ceph" if one is looking to have ceph-osd 
> ran as user ceph and not as root. Now, this command would run a chown process 
> one osd at a time. I am considering my cluster to be a fairly small cluster 
> with just 30 osds between 3 osd servers. It takes about 60 minutes to run the 
> chown command on each osd (3TB disks with about 60% usage). It would take 
> about 10 hours to complete this command on each osd server, which is just mad 
> in my opinion. I can't imagine this working well at all on servers with 20-30 
> osds! IMHO the docs should be adjusted to instruct users to run the chown in 
> _parallel_ on all osds instead of doing it one by one.
> 
> In addition, the documentation does not mention the issues with journals, 
> which I think is a big miss. In the end, I had to hack a quick udev rule to 
> address this at the boot time, as my journal ssds were still owned by 
> root:disk after a reboot.
> 
> 
> 
> 3. Radosgw service - After the upgrade, the radosgw service was still 
> starting as user root. Also, using the start/stop/restart scripts that came 
> with the package simply do not start the service at all. For example, start 
> radosgw or start radosgw-all-started does not start the service. I had to use 
> the old startup script /etc/init.d/radosgw in order to start the service, but 
> the service is started as user root and not ceph as intended in Jewel.
> 
> 
> Overall, after sorting out most of the issues, the cluster is running okay 
> for 2 days now. The radosgw issue still need looking at though.
> 
> 
> Cheers
> 
> Andrei
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to