On Fri, 12 Sep 2014 12:05:06 -0400 Brian Rak wrote: > That's not how ulimit works. Check the `ulimit -a` output. > Indeed.
And to forestall the next questions, see "man initscript", mine looks like this: --- ulimit -Hn 131072 ulimit -Sn 65536 # Execute the program. eval exec "$4" --- And also a /etc/security/limits.d/tuning.conf (debian) like this: --- root soft nofile 65536 root hard nofile 131072 * soft nofile 16384 * hard nofile 65536 --- Adjusted to your actual needs. There might be other limits you're hitting, but that is the most likely one Also 45 OSDs with 12 (24 with HT, bleah) CPU cores is pretty ballsy. I personally would rather do 4 RAID6 (10 disks, with OSD SSD journals) with that kind of case and enjoy the fact that my OSDs never fail. ^o^ Christian (another one) > On 9/12/2014 10:15 AM, Christian Eichelmann wrote: > > Hi, > > > > I am running all commands as root, so there are no limits for the > > processes. > > > > Regards, > > Christian > > _______________________________________ > > Von: Mariusz Gronczewski [mariusz.gronczew...@efigence.com] > > Gesendet: Freitag, 12. September 2014 15:33 > > An: Christian Eichelmann > > Cc: ceph-users@lists.ceph.com > > Betreff: Re: [ceph-users] OSDs are crashing with "Cannot fork" or > > "cannot create thread" but plenty of memory is left > > > > do cat /proc/<pid>/limits > > > > probably you hit max processes limit or max FD limit > > > >> Hi Ceph-Users, > >> > >> I have absolutely no idea what is going on on my systems... > >> > >> Hardware: > >> 45 x 4TB Harddisks > >> 2 x 6 Core CPUs > >> 256GB Memory > >> > >> When initializing all disks and join them to the cluster, after > >> approximately 30 OSDs, other osds are crashing. When I try to start > >> them again I see different kinds of errors. For example: > >> > >> > >> Starting Ceph osd.316 on ceph-osd-bs04...already running > >> === osd.317 === > >> Traceback (most recent call last): > >> File "/usr/bin/ceph", line 830, in <module> > >> sys.exit(main()) > >> File "/usr/bin/ceph", line 773, in main > >> sigdict, inbuf, verbose) > >> File "/usr/bin/ceph", line 420, in new_style_command > >> inbuf=inbuf) > >> File "/usr/lib/python2.7/dist-packages/ceph_argparse.py", line > >> 1112, in json_command > >> raise RuntimeError('"{0}": exception {1}'.format(cmd, e)) > >> NameError: global name 'cmd' is not defined > >> Exception thread.error: error("can't start new thread",) in <bound > >> method Rados.__del__ of <rados.Rados object > >> at 0x29ee410>> ignored > >> > >> > >> or: > >> /etc/init.d/ceph: 190: /etc/init.d/ceph: Cannot fork > >> /etc/init.d/ceph: 191: /etc/init.d/ceph: Cannot fork > >> /etc/init.d/ceph: 192: /etc/init.d/ceph: Cannot fork > >> > >> or: > >> /usr/bin/ceph-crush-location: 72: /usr/bin/ceph-crush-location: > >> Cannot fork /usr/bin/ceph-crush-location: > >> 79: /usr/bin/ceph-crush-location: Cannot fork Thread::try_create(): > >> pthread_create failed with error 11common/Thread.cc: In function > >> 'void Thread::create(size_t)' thread 7fcf768c9760 time 2014-09-12 > >> 15:00:28.284735 common/Thread.cc: 110: FAILED assert(ret == 0) > >> ceph version 0.80.5 (38b73c67d375a2552d8ed67843c8a65c2c0feba6) > >> 1: /usr/bin/ceph-conf() [0x51de8f] > >> 2: (CephContext::CephContext(unsigned int)+0xb1) [0x520fe1] > >> 3: (common_preinit(CephInitParameters const&, code_environment_t, > >> int)+0x48) [0x52eb78] > >> 4: (global_pre_init(std::vector<char const*, std::allocator<char > >> const*> >*, std::vector<char const*, std::allocator<char const*> >&, > >> unsigned int, code_environment_t, int)+0x8d) [0x518d0d] > >> 5: (main()+0x17a) [0x514f6a] > >> 6: (__libc_start_main()+0xfd) [0x7fcf7522ceed] > >> 7: /usr/bin/ceph-conf() [0x5168d1] > >> NOTE: a copy of the executable, or `objdump -rdS <executable>` is > >> needed to interpret this. > >> terminate called after throwing an instance of 'ceph::FailedAssertion' > >> Aborted (core dumped) > >> /etc/init.d/ceph: 340: /etc/init.d/ceph: Cannot fork > >> /etc/init.d/ceph: 1: /etc/init.d/ceph: Cannot fork > >> Traceback (most recent call last): > >> File "/usr/bin/ceph", line 830, in <module> > >> sys.exit(main()) > >> File "/usr/bin/ceph", line 590, in main > >> conffile=conffile) > >> File "/usr/lib/python2.7/dist-packages/rados.py", line 198, in > >> __init__ librados_path = find_library('rados') > >> File "/usr/lib/python2.7/ctypes/util.py", line 224, in find_library > >> return _findSoname_ldconfig(name) or > >> _get_soname(_findLib_gcc(name)) File > >> "/usr/lib/python2.7/ctypes/util.py", line 213, in _findSoname_ldconfig > >> f = os.popen('/sbin/ldconfig -p 2>/dev/null') > >> OSError: [Errno 12] Cannot allocate memory > >> > >> But anyways, when I look at the memory consumption of the system: > >> # free -m > >> total used free shared buffers > >> cached Mem: 258450 25841 232609 0 > >> 18 15506 -/+ buffers/cache: 10315 248135 > >> Swap: 3811 0 3811 > >> > >> > >> There are more then 230GB of memory available! What is going on there? > >> System: > >> Linux ceph-osd-bs04 3.14-0.bpo.1-amd64 #1 SMP Debian 3.14.12-1~bpo70+1 > >> (2014-07-13) x86_64 GNU/Linux > >> > >> Since this is happening on other Hardware as well, I don't think it's > >> Hardware related. I have no Idea if this is an OS issue (which would > >> be seriously strange) or a ceph issue. > >> > >> Since this is happening only AFTER we upgraded to firefly, I guess it > >> has something to do with ceph. > >> > >> ANY idea on what is going on here would be very appreciated! > >> > >> Regards, > >> Christian > >> _______________________________________________ > >> ceph-users mailing list > >> ceph-users@lists.ceph.com > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > > -- > > Mariusz Gronczewski, Administrator > > > > Efigence S. A. > > ul. WoĊoska 9a, 02-583 Warszawa > > T: [+48] 22 380 13 13 > > F: [+48] 22 380 13 14 > > E: mariusz.gronczew...@efigence.com > > <mailto:mariusz.gronczew...@efigence.com> > > _______________________________________________ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- Christian Balzer Network/Systems Engineer ch...@gol.com Global OnLine Japan/Fusion Communications http://www.gol.com/ _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com