Re: [ceph-users] trouble starting ceph @ boot

David Turner Wed, 10 May 2017 13:07:57 -0700

Are you mounting your OSDs using fstab or anything else?  Ceph uses udev
rules and partition identifiers to know what a disk is and where to mount
it, assuming that you have your GUIDs set properly on your disks.
 ceph-deploy does this by default.


On Wed, May 10, 2017 at 3:46 PM David Turner <[email protected]> wrote:

> `update-rc.d 'ceph' defaults 99`
> That should put it last in the boot order.  The '99' here is a number
> 01-99 where the lower the number the earlier in the boot sequence the
> service is started.  To see what order your service is set to start and
> stop, `ls /etc/rc*.d/*{service}.  Each rc# represents the runlevels.  K##
> is the order that services will be stopped, S$$ is the order that services
> will be started.  After you run the above command, it should change Ceph to
> S99.  If you want to fine tune it, you can see which services are starting
> up after ceph and see if you can locate the specific one that is causing
> your problems.
>
> On Wed, May 10, 2017 at 3:34 PM <[email protected]> wrote:
>
>> David,
>>
>>
>>
>> ceph tell osd.12 version replies version 11.2.0
>>
>>
>>
>> Distro is Ubuntu 14.04.5 LTS (trusty) which utilizes upstart for ceph.
>>
>>
>>
>> I don’t see a good way ensure last in an event based system like upstart.
>>
>>
>>
>> For the record I already tried after networking and after filesystems are
>> mounted to and that didn’t seem to help things.
>>
>>
>>
>>
>>
>>
>>
>> *From: *David Turner <[email protected]>
>> *Sent: *Wednesday, May 10, 2017 3:21 PM
>>
>>
>> *To: *[email protected]; [email protected]
>> *Subject: *Re: [ceph-users] trouble starting ceph @ boot
>>
>>
>>
>> I would probably just make it start last in the boot order.  Depending on
>> your distribution/version, that will be as simple as setting it to 99 for
>> starting up.  Which distribution/version are you running?
>>
>>
>>
>> On Wed, May 10, 2017 at 2:36 PM <[email protected]> wrote:
>>
>> David,
>>
>>
>>
>> I get what you are saying. Do you have a suggestion as to what service I
>> make ceph-osd depend on to reliable start?
>>
>>
>>
>> My understanding is that these daemons should all be sort of independent
>> of each other.
>>
>>
>>
>> -Zach
>>
>>
>>
>>
>>
>>
>>
>> *From: *David Turner <[email protected]>
>> *Sent: *Wednesday, May 10, 2017 1:18 PM
>> *To: *[email protected]; [email protected]
>> *Subject: *Re: [ceph-users] trouble starting ceph @ boot
>>
>>
>>
>> Have you attempted to place the ceph-osd startup later in the boot
>> process.  Which distribution/version are you running?  Each does it
>> slightly different.  This can be problematic for some services, very
>> commonly in cases where a network drive is mapped and used by a service
>> like mysql (terrible example, but effective).  If you try to start mysql
>> before the network is up and the drive is mapped, then mysql will fail.
>> Some work arounds are to put a sleep in the init script, or retry (similar
>> to what you did), but ultimately, you probably want to set a requisite
>> service to have started or just place the service in a later starting
>> position.
>>
>>
>>
>> On Wed, May 10, 2017 at 9:43 AM <[email protected]> wrote:
>>
>> System: Ubuntu Trusty 14.04
>>
>> Release : Kraken
>>
>>
>> Issue:
>>
>> When starting ceph-osd daemon on boot via upstart. Error message in
>> /var/log/upstart/ceph-osd-ceph_#.log reports 3 attempt to start the service
>> with the errors message below
>>
>>
>>
>> starting osd.12 at - osd_data /var/lib/ceph/osd/ceph-12
>> /var/lib/ceph/osd/ceph-12/journal
>>
>> 2017-05-09 13:38:34.507004 7f6d46a2e980 -1 journal FileJournal::_open:
>> disabling aio for non-block journal. Use journal_force_aio to force use of
>> aio anyway
>>
>> 2017-05-09 13:38:38.432333 7f6d46a2e980 -1 osd.12 2284024 PGs are
>> upgrading
>>
>> unable to look up group 'ceph': (34) Numerical result out of range
>>
>> unable to look up group 'ceph': (34) Numerical result out of range
>>
>> unable to look up group 'ceph': (34) Numerical result out of range
>>
>>
>>
>> Workaround:
>>
>>
>>
>> If I configure /etc/init/ceph-osd.conf like so
>>
>>
>>
>> -respawn limit 3 1800
>>
>> +respawn limit unlimited
>>
>>
>>
>> I get roughly 20 attempts to start the each osd daemon and then it
>> successfully starts.
>>
>>
>>
>> Starting the daemons by hand works just fine after boot.
>>
>>
>>
>> Possible reasons:
>>
>>
>>
>> NSCD is being utilized and may not have started yet. However disabling
>> this service doesn’t not improve starting the service without the
>> workaround in place.
>>
>>
>>
>>
>>
>> The message seems to be coming global/global_init.cc
>>
>>
>>
>> ./global/global_init.cc- struct passwd *p = 0;
>>
>> ./global/global_init.cc- getpwnam_r(g_conf->setuser.c_str(), &pa, buf,
>> sizeof(buf), &p);
>>
>> ./global/global_init.cc- if (!p) {
>>
>> ./global/global_init.cc- cerr << "unable to look up user '" <<
>> g_conf->setuser << "'"
>>
>> ./global/global_init.cc- << std::endl;
>>
>> ./global/global_init.cc- exit(1);
>>
>> ./global/global_init.cc- }
>>
>> ./global/global_init.cc- uid = p->pw_uid;
>>
>> ./global/global_init.cc- gid = p->pw_gid;
>>
>> ./global/global_init.cc- uid_string = g_conf->setuser;
>>
>> ./global/global_init.cc- }
>>
>> ./global/global_init.cc- }
>>
>> ./global/global_init.cc- if (g_conf->setgroup.length() > 0) {
>>
>> ./global/global_init.cc- gid = atoi(g_conf->setgroup.c_str());
>>
>> ./global/global_init.cc- if (!gid) {
>>
>> ./global/global_init.cc- char buf[4096];
>>
>> ./global/global_init.cc- struct group gr;
>>
>> ./global/global_init.cc- struct group *g = 0;
>>
>> ./global/global_init.cc- getgrnam_r(g_conf->setgroup.c_str(), &gr, buf,
>> sizeof(buf), &g);
>>
>> ./global/global_init.cc- if (!g) {
>>
>> ./global/global_init.cc: cerr << "unable to look up group '" <<
>> g_conf->setgroup << "'"
>>
>> ./global/global_init.cc- << ": " << cpp_strerror(errno) << std::endl;
>>
>> ./global/global_init.cc- exit(1);
>>
>> ./global/global_init.cc- }
>>
>> ./global/global_init.cc- gid = g->gr_gid;
>>
>> ./global/global_init.cc- gid_string = g_conf->setgroup;
>>
>> ./global/global_init.cc- }
>>
>> ./global/global_init.cc- }
>>
>>
>>
>> 34 as an error code seems to correspond to ERANGE Insufficient buffer
>> space supplied. I assume this is because getgrnam_r() returns NULL if it
>> can’t find the group.
>>
>>
>>
>> But as to why the group isn’t retrievable I have no idea, As
>>
>> getent group ceph
>>
>> ceph:x:59623:ceph
>>
>>
>>
>> GID changed for security reasons.
>>
>>
>>
>> Additional Information:
>>
>>
>>
>> I also see this in boot.log not sure if it is related
>>
>> failed: 'ulimit -n 32768; /usr/bin/ceph-mds -i cephstorelx2 --pid-file
>> /var/run/ceph/mds.cephstorelx2//mds.cephstorelx2.pid -c /etc/ceph/ceph.conf
>> --cluster ceph --setuser ceph --setgroup ceph '
>>
>>
>> Any pointers would be helpful.
>>
>>
>> -Zach
>>
>> _______________________________________________
>> ceph-users mailing list
>> [email protected]
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>>
>>
>>
>

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] trouble starting ceph @ boot

Reply via email to