Hello 

I've recently updated my Hammer ceph cluster running on Ubuntu 14.04 LTS 
servers and noticed a few issues during the upgrade. Just wanted to share my 
experience. 

I've installed the latest Jewel release. In my opinion, some of the issues I 
came across relate to poor upgrade documentation instructions, others to 
inconsistencies in the ubuntu package. Here are the issues i've picked up (I've 
followed the release notes upgrade procedure): 


1. Ceph journals - After performing the upgrade the ceph-osd processes are not 
starting. I've followed the instructions and chowned /var/lib/ceph (also see 
point 2 below). The issue relates to the journal partitions, which are not 
chowned due to the symlinks. Thus, the ceph user had no read/write access to 
the journal partitions. IMHO, this should be addressed at the documentation 
layer unless it can be easily and reliably dealt with by the installation 
script. 



2. Inefficient chown documentation - The documentation states that one should 
"chown -R ceph:ceph /var/lib/ceph" if one is looking to have ceph-osd ran as 
user ceph and not as root. Now, this command would run a chown process one osd 
at a time. I am considering my cluster to be a fairly small cluster with just 
30 osds between 3 osd servers. It takes about 60 minutes to run the chown 
command on each osd (3TB disks with about 60% usage). It would take about 10 
hours to complete this command on each osd server, which is just mad in my 
opinion. I can't imagine this working well at all on servers with 20-30 osds! 
IMHO the docs should be adjusted to instruct users to run the chown in 
_parallel_ on all osds instead of doing it one by one. 

In addition, the documentation does not mention the issues with journals, which 
I think is a big miss. In the end, I had to hack a quick udev rule to address 
this at the boot time, as my journal ssds were still owned by root:disk after a 
reboot. 



3. Radosgw service - After the upgrade, the radosgw service was still starting 
as user root. Also, using the start/stop/restart scripts that came with the 
package simply do not start the service at all. For example, start radosgw or 
start radosgw-all-started does not start the service. I had to use the old 
startup script /etc/init.d/radosgw in order to start the service, but the 
service is started as user root and not ceph as intended in Jewel. 


Overall, after sorting out most of the issues, the cluster is running okay for 
2 days now. The radosgw issue still need looking at though. 


Cheers 

Andrei 
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to