Are you mounting your OSDs using fstab or anything else? Ceph uses udev rules and partition identifiers to know what a disk is and where to mount it, assuming that you have your GUIDs set properly on your disks. ceph-deploy does this by default.
On Wed, May 10, 2017 at 3:46 PM David Turner <drakonst...@gmail.com> wrote: > `update-rc.d 'ceph' defaults 99` > That should put it last in the boot order. The '99' here is a number > 01-99 where the lower the number the earlier in the boot sequence the > service is started. To see what order your service is set to start and > stop, `ls /etc/rc*.d/*{service}. Each rc# represents the runlevels. K## > is the order that services will be stopped, S$$ is the order that services > will be started. After you run the above command, it should change Ceph to > S99. If you want to fine tune it, you can see which services are starting > up after ceph and see if you can locate the specific one that is causing > your problems. > > On Wed, May 10, 2017 at 3:34 PM <vida.z...@gmail.com> wrote: > >> David, >> >> >> >> ceph tell osd.12 version replies version 11.2.0 >> >> >> >> Distro is Ubuntu 14.04.5 LTS (trusty) which utilizes upstart for ceph. >> >> >> >> I don’t see a good way ensure last in an event based system like upstart. >> >> >> >> For the record I already tried after networking and after filesystems are >> mounted to and that didn’t seem to help things. >> >> >> >> >> >> >> >> *From: *David Turner <drakonst...@gmail.com> >> *Sent: *Wednesday, May 10, 2017 3:21 PM >> >> >> *To: *vida.z...@gmail.com; ceph-users@lists.ceph.com >> *Subject: *Re: [ceph-users] trouble starting ceph @ boot >> >> >> >> I would probably just make it start last in the boot order. Depending on >> your distribution/version, that will be as simple as setting it to 99 for >> starting up. Which distribution/version are you running? >> >> >> >> On Wed, May 10, 2017 at 2:36 PM <vida.z...@gmail.com> wrote: >> >> David, >> >> >> >> I get what you are saying. Do you have a suggestion as to what service I >> make ceph-osd depend on to reliable start? >> >> >> >> My understanding is that these daemons should all be sort of independent >> of each other. >> >> >> >> -Zach >> >> >> >> >> >> >> >> *From: *David Turner <drakonst...@gmail.com> >> *Sent: *Wednesday, May 10, 2017 1:18 PM >> *To: *vida.z...@gmail.com; ceph-users@lists.ceph.com >> *Subject: *Re: [ceph-users] trouble starting ceph @ boot >> >> >> >> Have you attempted to place the ceph-osd startup later in the boot >> process. Which distribution/version are you running? Each does it >> slightly different. This can be problematic for some services, very >> commonly in cases where a network drive is mapped and used by a service >> like mysql (terrible example, but effective). If you try to start mysql >> before the network is up and the drive is mapped, then mysql will fail. >> Some work arounds are to put a sleep in the init script, or retry (similar >> to what you did), but ultimately, you probably want to set a requisite >> service to have started or just place the service in a later starting >> position. >> >> >> >> On Wed, May 10, 2017 at 9:43 AM <vida.z...@gmail.com> wrote: >> >> System: Ubuntu Trusty 14.04 >> >> Release : Kraken >> >> >> Issue: >> >> When starting ceph-osd daemon on boot via upstart. Error message in >> /var/log/upstart/ceph-osd-ceph_#.log reports 3 attempt to start the service >> with the errors message below >> >> >> >> starting osd.12 at - osd_data /var/lib/ceph/osd/ceph-12 >> /var/lib/ceph/osd/ceph-12/journal >> >> 2017-05-09 13:38:34.507004 7f6d46a2e980 -1 journal FileJournal::_open: >> disabling aio for non-block journal. Use journal_force_aio to force use of >> aio anyway >> >> 2017-05-09 13:38:38.432333 7f6d46a2e980 -1 osd.12 2284024 PGs are >> upgrading >> >> unable to look up group 'ceph': (34) Numerical result out of range >> >> unable to look up group 'ceph': (34) Numerical result out of range >> >> unable to look up group 'ceph': (34) Numerical result out of range >> >> >> >> Workaround: >> >> >> >> If I configure /etc/init/ceph-osd.conf like so >> >> >> >> -respawn limit 3 1800 >> >> +respawn limit unlimited >> >> >> >> I get roughly 20 attempts to start the each osd daemon and then it >> successfully starts. >> >> >> >> Starting the daemons by hand works just fine after boot. >> >> >> >> Possible reasons: >> >> >> >> NSCD is being utilized and may not have started yet. However disabling >> this service doesn’t not improve starting the service without the >> workaround in place. >> >> >> >> >> >> The message seems to be coming global/global_init.cc >> >> >> >> ./global/global_init.cc- struct passwd *p = 0; >> >> ./global/global_init.cc- getpwnam_r(g_conf->setuser.c_str(), &pa, buf, >> sizeof(buf), &p); >> >> ./global/global_init.cc- if (!p) { >> >> ./global/global_init.cc- cerr << "unable to look up user '" << >> g_conf->setuser << "'" >> >> ./global/global_init.cc- << std::endl; >> >> ./global/global_init.cc- exit(1); >> >> ./global/global_init.cc- } >> >> ./global/global_init.cc- uid = p->pw_uid; >> >> ./global/global_init.cc- gid = p->pw_gid; >> >> ./global/global_init.cc- uid_string = g_conf->setuser; >> >> ./global/global_init.cc- } >> >> ./global/global_init.cc- } >> >> ./global/global_init.cc- if (g_conf->setgroup.length() > 0) { >> >> ./global/global_init.cc- gid = atoi(g_conf->setgroup.c_str()); >> >> ./global/global_init.cc- if (!gid) { >> >> ./global/global_init.cc- char buf[4096]; >> >> ./global/global_init.cc- struct group gr; >> >> ./global/global_init.cc- struct group *g = 0; >> >> ./global/global_init.cc- getgrnam_r(g_conf->setgroup.c_str(), &gr, buf, >> sizeof(buf), &g); >> >> ./global/global_init.cc- if (!g) { >> >> ./global/global_init.cc: cerr << "unable to look up group '" << >> g_conf->setgroup << "'" >> >> ./global/global_init.cc- << ": " << cpp_strerror(errno) << std::endl; >> >> ./global/global_init.cc- exit(1); >> >> ./global/global_init.cc- } >> >> ./global/global_init.cc- gid = g->gr_gid; >> >> ./global/global_init.cc- gid_string = g_conf->setgroup; >> >> ./global/global_init.cc- } >> >> ./global/global_init.cc- } >> >> >> >> 34 as an error code seems to correspond to ERANGE Insufficient buffer >> space supplied. I assume this is because getgrnam_r() returns NULL if it >> can’t find the group. >> >> >> >> But as to why the group isn’t retrievable I have no idea, As >> >> getent group ceph >> >> ceph:x:59623:ceph >> >> >> >> GID changed for security reasons. >> >> >> >> Additional Information: >> >> >> >> I also see this in boot.log not sure if it is related >> >> failed: 'ulimit -n 32768; /usr/bin/ceph-mds -i cephstorelx2 --pid-file >> /var/run/ceph/mds.cephstorelx2//mds.cephstorelx2.pid -c /etc/ceph/ceph.conf >> --cluster ceph --setuser ceph --setgroup ceph ' >> >> >> Any pointers would be helpful. >> >> >> -Zach >> >> _______________________________________________ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> >> >> >> >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com