I would probably just make it start last in the boot order. Depending on your distribution/version, that will be as simple as setting it to 99 for starting up. Which distribution/version are you running?
On Wed, May 10, 2017 at 2:36 PM <vida.z...@gmail.com> wrote: > David, > > > > I get what you are saying. Do you have a suggestion as to what service I > make ceph-osd depend on to reliable start? > > > > My understanding is that these daemons should all be sort of independent > of each other. > > > > -Zach > > > > > > > > *From: *David Turner <drakonst...@gmail.com> > *Sent: *Wednesday, May 10, 2017 1:18 PM > *To: *vida.z...@gmail.com; ceph-users@lists.ceph.com > *Subject: *Re: [ceph-users] trouble starting ceph @ boot > > > > Have you attempted to place the ceph-osd startup later in the boot > process. Which distribution/version are you running? Each does it > slightly different. This can be problematic for some services, very > commonly in cases where a network drive is mapped and used by a service > like mysql (terrible example, but effective). If you try to start mysql > before the network is up and the drive is mapped, then mysql will fail. > Some work arounds are to put a sleep in the init script, or retry (similar > to what you did), but ultimately, you probably want to set a requisite > service to have started or just place the service in a later starting > position. > > > > On Wed, May 10, 2017 at 9:43 AM <vida.z...@gmail.com> wrote: > > System: Ubuntu Trusty 14.04 > > Release : Kraken > > > Issue: > > When starting ceph-osd daemon on boot via upstart. Error message in > /var/log/upstart/ceph-osd-ceph_#.log reports 3 attempt to start the service > with the errors message below > > > > starting osd.12 at - osd_data /var/lib/ceph/osd/ceph-12 > /var/lib/ceph/osd/ceph-12/journal > > 2017-05-09 13:38:34.507004 7f6d46a2e980 -1 journal FileJournal::_open: > disabling aio for non-block journal. Use journal_force_aio to force use of > aio anyway > > 2017-05-09 13:38:38.432333 7f6d46a2e980 -1 osd.12 2284024 PGs are upgrading > > unable to look up group 'ceph': (34) Numerical result out of range > > unable to look up group 'ceph': (34) Numerical result out of range > > unable to look up group 'ceph': (34) Numerical result out of range > > > > Workaround: > > > > If I configure /etc/init/ceph-osd.conf like so > > > > -respawn limit 3 1800 > > +respawn limit unlimited > > > > I get roughly 20 attempts to start the each osd daemon and then it > successfully starts. > > > > Starting the daemons by hand works just fine after boot. > > > > Possible reasons: > > > > NSCD is being utilized and may not have started yet. However disabling > this service doesn’t not improve starting the service without the > workaround in place. > > > > > > The message seems to be coming global/global_init.cc > > > > ./global/global_init.cc- struct passwd *p = 0; > > ./global/global_init.cc- getpwnam_r(g_conf->setuser.c_str(), &pa, buf, > sizeof(buf), &p); > > ./global/global_init.cc- if (!p) { > > ./global/global_init.cc- cerr << "unable to look up user '" << > g_conf->setuser << "'" > > ./global/global_init.cc- << std::endl; > > ./global/global_init.cc- exit(1); > > ./global/global_init.cc- } > > ./global/global_init.cc- uid = p->pw_uid; > > ./global/global_init.cc- gid = p->pw_gid; > > ./global/global_init.cc- uid_string = g_conf->setuser; > > ./global/global_init.cc- } > > ./global/global_init.cc- } > > ./global/global_init.cc- if (g_conf->setgroup.length() > 0) { > > ./global/global_init.cc- gid = atoi(g_conf->setgroup.c_str()); > > ./global/global_init.cc- if (!gid) { > > ./global/global_init.cc- char buf[4096]; > > ./global/global_init.cc- struct group gr; > > ./global/global_init.cc- struct group *g = 0; > > ./global/global_init.cc- getgrnam_r(g_conf->setgroup.c_str(), &gr, buf, > sizeof(buf), &g); > > ./global/global_init.cc- if (!g) { > > ./global/global_init.cc: cerr << "unable to look up group '" << > g_conf->setgroup << "'" > > ./global/global_init.cc- << ": " << cpp_strerror(errno) << std::endl; > > ./global/global_init.cc- exit(1); > > ./global/global_init.cc- } > > ./global/global_init.cc- gid = g->gr_gid; > > ./global/global_init.cc- gid_string = g_conf->setgroup; > > ./global/global_init.cc- } > > ./global/global_init.cc- } > > > > 34 as an error code seems to correspond to ERANGE Insufficient buffer > space supplied. I assume this is because getgrnam_r() returns NULL if it > can’t find the group. > > > > But as to why the group isn’t retrievable I have no idea, As > > getent group ceph > > ceph:x:59623:ceph > > > > GID changed for security reasons. > > > > Additional Information: > > > > I also see this in boot.log not sure if it is related > > failed: 'ulimit -n 32768; /usr/bin/ceph-mds -i cephstorelx2 --pid-file > /var/run/ceph/mds.cephstorelx2//mds.cephstorelx2.pid -c /etc/ceph/ceph.conf > --cluster ceph --setuser ceph --setgroup ceph ' > > > Any pointers would be helpful. > > > -Zach > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com