I would probably just make it start last in the boot order.  Depending on
your distribution/version, that will be as simple as setting it to 99 for
starting up.  Which distribution/version are you running?

On Wed, May 10, 2017 at 2:36 PM <vida.z...@gmail.com> wrote:

> David,
>
>
>
> I get what you are saying. Do you have a suggestion as to what service I
> make ceph-osd depend on to reliable start?
>
>
>
> My understanding is that these daemons should all be sort of independent
> of each other.
>
>
>
> -Zach
>
>
>
>
>
>
>
> *From: *David Turner <drakonst...@gmail.com>
> *Sent: *Wednesday, May 10, 2017 1:18 PM
> *To: *vida.z...@gmail.com; ceph-users@lists.ceph.com
> *Subject: *Re: [ceph-users] trouble starting ceph @ boot
>
>
>
> Have you attempted to place the ceph-osd startup later in the boot
> process.  Which distribution/version are you running?  Each does it
> slightly different.  This can be problematic for some services, very
> commonly in cases where a network drive is mapped and used by a service
> like mysql (terrible example, but effective).  If you try to start mysql
> before the network is up and the drive is mapped, then mysql will fail.
> Some work arounds are to put a sleep in the init script, or retry (similar
> to what you did), but ultimately, you probably want to set a requisite
> service to have started or just place the service in a later starting
> position.
>
>
>
> On Wed, May 10, 2017 at 9:43 AM <vida.z...@gmail.com> wrote:
>
> System: Ubuntu Trusty 14.04
>
> Release : Kraken
>
>
> Issue:
>
> When starting ceph-osd daemon on boot via upstart. Error message in
> /var/log/upstart/ceph-osd-ceph_#.log reports 3 attempt to start the service
> with the errors message below
>
>
>
> starting osd.12 at - osd_data /var/lib/ceph/osd/ceph-12
> /var/lib/ceph/osd/ceph-12/journal
>
> 2017-05-09 13:38:34.507004 7f6d46a2e980 -1 journal FileJournal::_open:
> disabling aio for non-block journal. Use journal_force_aio to force use of
> aio anyway
>
> 2017-05-09 13:38:38.432333 7f6d46a2e980 -1 osd.12 2284024 PGs are upgrading
>
> unable to look up group 'ceph': (34) Numerical result out of range
>
> unable to look up group 'ceph': (34) Numerical result out of range
>
> unable to look up group 'ceph': (34) Numerical result out of range
>
>
>
> Workaround:
>
>
>
> If I configure /etc/init/ceph-osd.conf like so
>
>
>
> -respawn limit 3 1800
>
> +respawn limit unlimited
>
>
>
> I get roughly 20 attempts to start the each osd daemon and then it
> successfully starts.
>
>
>
> Starting the daemons by hand works just fine after boot.
>
>
>
> Possible reasons:
>
>
>
> NSCD is being utilized and may not have started yet. However disabling
> this service doesn’t not improve starting the service without the
> workaround in place.
>
>
>
>
>
> The message seems to be coming global/global_init.cc
>
>
>
> ./global/global_init.cc- struct passwd *p = 0;
>
> ./global/global_init.cc- getpwnam_r(g_conf->setuser.c_str(), &pa, buf,
> sizeof(buf), &p);
>
> ./global/global_init.cc- if (!p) {
>
> ./global/global_init.cc- cerr << "unable to look up user '" <<
> g_conf->setuser << "'"
>
> ./global/global_init.cc- << std::endl;
>
> ./global/global_init.cc- exit(1);
>
> ./global/global_init.cc- }
>
> ./global/global_init.cc- uid = p->pw_uid;
>
> ./global/global_init.cc- gid = p->pw_gid;
>
> ./global/global_init.cc- uid_string = g_conf->setuser;
>
> ./global/global_init.cc- }
>
> ./global/global_init.cc- }
>
> ./global/global_init.cc- if (g_conf->setgroup.length() > 0) {
>
> ./global/global_init.cc- gid = atoi(g_conf->setgroup.c_str());
>
> ./global/global_init.cc- if (!gid) {
>
> ./global/global_init.cc- char buf[4096];
>
> ./global/global_init.cc- struct group gr;
>
> ./global/global_init.cc- struct group *g = 0;
>
> ./global/global_init.cc- getgrnam_r(g_conf->setgroup.c_str(), &gr, buf,
> sizeof(buf), &g);
>
> ./global/global_init.cc- if (!g) {
>
> ./global/global_init.cc: cerr << "unable to look up group '" <<
> g_conf->setgroup << "'"
>
> ./global/global_init.cc- << ": " << cpp_strerror(errno) << std::endl;
>
> ./global/global_init.cc- exit(1);
>
> ./global/global_init.cc- }
>
> ./global/global_init.cc- gid = g->gr_gid;
>
> ./global/global_init.cc- gid_string = g_conf->setgroup;
>
> ./global/global_init.cc- }
>
> ./global/global_init.cc- }
>
>
>
> 34 as an error code seems to correspond to ERANGE Insufficient buffer
> space supplied. I assume this is because getgrnam_r() returns NULL if it
> can’t find the group.
>
>
>
> But as to why the group isn’t retrievable I have no idea, As
>
> getent group ceph
>
> ceph:x:59623:ceph
>
>
>
> GID changed for security reasons.
>
>
>
> Additional Information:
>
>
>
> I also see this in boot.log not sure if it is related
>
> failed: 'ulimit -n 32768; /usr/bin/ceph-mds -i cephstorelx2 --pid-file
> /var/run/ceph/mds.cephstorelx2//mds.cephstorelx2.pid -c /etc/ceph/ceph.conf
> --cluster ceph --setuser ceph --setgroup ceph '
>
>
> Any pointers would be helpful.
>
>
> -Zach
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to