I was able to set the order to 99 as your indicated but /var/log/upstart/ceph 
logs still complain excessively about unable to look up group 'ceph': (34) 
Numerical result out of range

Mounting is done via /etc/fstab for osds. Which are xfs formatted HDDs. 

-Zach
From: David Turner
Sent: Wednesday, May 10, 2017 4:07 PM
To: vida.z...@gmail.com; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] trouble starting ceph @ boot

Are you mounting your OSDs using fstab or anything else?  Ceph uses udev rules 
and partition identifiers to know what a disk is and where to mount it, 
assuming that you have your GUIDs set properly on your disks.  ceph-deploy does 
this by default.

On Wed, May 10, 2017 at 3:46 PM David Turner <drakonst...@gmail.com> wrote:
`update-rc.d 'ceph' defaults 99`
That should put it last in the boot order.  The '99' here is a number 01-99 
where the lower the number the earlier in the boot sequence the service is 
started.  To see what order your service is set to start and stop, `ls 
/etc/rc*.d/*{service}.  Each rc# represents the runlevels.  K## is the order 
that services will be stopped, S$$ is the order that services will be started.  
After you run the above command, it should change Ceph to S99.  If you want to 
fine tune it, you can see which services are starting up after ceph and see if 
you can locate the specific one that is causing your problems.

On Wed, May 10, 2017 at 3:34 PM <vida.z...@gmail.com> wrote:
David, 
 
ceph tell osd.12 version replies version 11.2.0
 
Distro is Ubuntu 14.04.5 LTS (trusty) which utilizes upstart for ceph. 
 
I don’t see a good way ensure last in an event based system like upstart. 
 
For the record I already tried after networking and after filesystems are 
mounted to and that didn’t seem to help things. 
 
 
 
From: David Turner
Sent: Wednesday, May 10, 2017 3:21 PM

To: vida.z...@gmail.com; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] trouble starting ceph @ boot
 
I would probably just make it start last in the boot order.  Depending on your 
distribution/version, that will be as simple as setting it to 99 for starting 
up.  Which distribution/version are you running? 
 
On Wed, May 10, 2017 at 2:36 PM <vida.z...@gmail.com> wrote:
David,
 
I get what you are saying. Do you have a suggestion as to what service I make 
ceph-osd depend on to reliable start?
 
My understanding is that these daemons should all be sort of independent of 
each other. 
 
-Zach
 
 
 
From: David Turner
Sent: Wednesday, May 10, 2017 1:18 PM
To: vida.z...@gmail.com; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] trouble starting ceph @ boot
 
Have you attempted to place the ceph-osd startup later in the boot process.  
Which distribution/version are you running?  Each does it slightly different.  
This can be problematic for some services, very commonly in cases where a 
network drive is mapped and used by a service like mysql (terrible example, but 
effective).  If you try to start mysql before the network is up and the drive 
is mapped, then mysql will fail.  Some work arounds are to put a sleep in the 
init script, or retry (similar to what you did), but ultimately, you probably 
want to set a requisite service to have started or just place the service in a 
later starting position.
 
On Wed, May 10, 2017 at 9:43 AM <vida.z...@gmail.com> wrote:
System: Ubuntu Trusty 14.04

Release : Kraken


Issue:

When starting ceph-osd daemon on boot via upstart. Error message in 
/var/log/upstart/ceph-osd-ceph_#.log reports 3 attempt to start the service 
with the errors message below



starting osd.12 at - osd_data /var/lib/ceph/osd/ceph-12 
/var/lib/ceph/osd/ceph-12/journal

2017-05-09 13:38:34.507004 7f6d46a2e980 -1 journal FileJournal::_open: 
disabling aio for non-block journal. Use journal_force_aio to force use of aio 
anyway

2017-05-09 13:38:38.432333 7f6d46a2e980 -1 osd.12 2284024 PGs are upgrading

unable to look up group 'ceph': (34) Numerical result out of range

unable to look up group 'ceph': (34) Numerical result out of range

unable to look up group 'ceph': (34) Numerical result out of range



Workaround:



If I configure /etc/init/ceph-osd.conf like so



-respawn limit 3 1800

+respawn limit unlimited



I get roughly 20 attempts to start the each osd daemon and then it successfully 
starts.



Starting the daemons by hand works just fine after boot.



Possible reasons:



NSCD is being utilized and may not have started yet. However disabling this 
service doesn’t not improve starting the service without the workaround in 
place.





The message seems to be coming global/global_init.cc



./global/global_init.cc- struct passwd *p = 0;

./global/global_init.cc- getpwnam_r(g_conf->setuser.c_str(), &pa, buf, 
sizeof(buf), &p);

./global/global_init.cc- if (!p) {

./global/global_init.cc- cerr << "unable to look up user '" << g_conf->setuser 
<< "'"

./global/global_init.cc- << std::endl;

./global/global_init.cc- exit(1);

./global/global_init.cc- }

./global/global_init.cc- uid = p->pw_uid;

./global/global_init.cc- gid = p->pw_gid;

./global/global_init.cc- uid_string = g_conf->setuser;

./global/global_init.cc- }

./global/global_init.cc- }

./global/global_init.cc- if (g_conf->setgroup.length() > 0) {

./global/global_init.cc- gid = atoi(g_conf->setgroup.c_str());

./global/global_init.cc- if (!gid) {

./global/global_init.cc- char buf[4096];

./global/global_init.cc- struct group gr;

./global/global_init.cc- struct group *g = 0;

./global/global_init.cc- getgrnam_r(g_conf->setgroup.c_str(), &gr, buf, 
sizeof(buf), &g);

./global/global_init.cc- if (!g) {

./global/global_init.cc: cerr << "unable to look up group '" << 
g_conf->setgroup << "'"

./global/global_init.cc- << ": " << cpp_strerror(errno) << std::endl;

./global/global_init.cc- exit(1);

./global/global_init.cc- }

./global/global_init.cc- gid = g->gr_gid;

./global/global_init.cc- gid_string = g_conf->setgroup;

./global/global_init.cc- }

./global/global_init.cc- }



34 as an error code seems to correspond to ERANGE Insufficient buffer space 
supplied. I assume this is because getgrnam_r() returns NULL if it can’t find 
the group.



But as to why the group isn’t retrievable I have no idea, As

getent group ceph

ceph:x:59623:ceph



GID changed for security reasons.



Additional Information:



I also see this in boot.log not sure if it is related

failed: 'ulimit -n 32768; /usr/bin/ceph-mds -i cephstorelx2 --pid-file 
/var/run/ceph/mds.cephstorelx2//mds.cephstorelx2.pid -c /etc/ceph/ceph.conf 
--cluster ceph --setuser ceph --setgroup ceph '


Any pointers would be helpful.

-Zach
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 
 

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to