Thanks Guys kernel.pid_max=4194303 did the trick. - Karan -
> On 09 Mar 2015, at 14:48, Christian Eichelmann > <christian.eichelm...@1und1.de> wrote: > > Hi Karan, > > as you are actually writing in your own book, the problem is the sysctl > setting "kernel.pid_max". I've seen in your bug report that you were > setting it to 65536, which is still to low for high density hardware. > > In our cluster, one OSD server has in an idle situation about 66.000 > Threads (60 OSDs per Server). The number of threads increases when you > increase the number of placement groups in the cluster, which I think > has triggered your problem. > > Set the "kernel.pid_max" setting to 4194303 (the maximum) like Azad > Aliyar suggested, and the problem should be gone. > > Regards, > Christian > > Am 09.03.2015 11:41, schrieb Karan Singh: >> Hello Community need help to fix a long going Ceph problem. >> >> Cluster is unhealthy , Multiple OSDs are DOWN. When i am trying to >> restart OSD’s i am getting this error >> >> >> /2015-03-09 12:22:16.312774 7f760dac9700 -1 common/Thread.cc >> <http://Thread.cc>: In function 'void Thread::create(size_t)' thread >> 7f760dac9700 time 2015-03-09 12:22:16.311970/ >> /common/Thread.cc <http://Thread.cc>: 129: FAILED assert(ret == 0)/ >> >> >> *Environment *: 4 Nodes , OSD+Monitor , Firefly latest , CentOS6.5 >> , 3.17.2-1.el6.elrepo.x86_64 >> >> Tried upgrading from 0.80.7 to 0.80.8 but no Luck >> >> Tried centOS stock kernel 2.6.32 but no Luck >> >> Memory is not a problem more then 150+GB is free >> >> >> Did any one every faced this problem ?? >> >> *Cluster status * >> * >> * >> / cluster 2bd3283d-67ef-4316-8b7e-d8f4747eae33/ >> / health HEALTH_WARN 7334 pgs degraded; 1185 pgs down; 1 pgs >> incomplete; 1735 pgs peering; 8938 pgs stale; 1/ >> /736 pgs stuck inactive; 8938 pgs stuck stale; 10320 pgs stuck unclean; >> recovery 6061/31080 objects degraded (19/ >> /.501%); 111/196 in osds are down; clock skew detected on mon.pouta-s02, >> mon.pouta-s03/ >> / monmap e3: 3 mons at >> {pouta-s01=10.XXX.50.1:6789/0,pouta-s02=10.XXX.50.2:6789/0,pouta-s03=10.XXX.50.3:6789/ >> //0}, election epoch 1312, quorum 0,1,2 pouta-s01,pouta-s02,pouta-s03/ >> / * osdmap e26633: 239 osds: 85 up, 196 in*/ >> / pgmap v60389: 17408 pgs, 13 pools, 42345 MB data, 10360 objects/ >> / 4699 GB used, 707 TB / 711 TB avail/ >> / 6061/31080 objects degraded (19.501%)/ >> / 14 down+remapped+peering/ >> / 39 active/ >> / 3289 active+clean/ >> / 547 peering/ >> / 663 stale+down+peering/ >> / 705 stale+active+remapped/ >> / 1 active+degraded+remapped/ >> / 1 stale+down+incomplete/ >> / 484 down+peering/ >> / 455 active+remapped/ >> / 3696 stale+active+degraded/ >> / 4 remapped+peering/ >> / 23 stale+down+remapped+peering/ >> / 51 stale+active/ >> / 3637 active+degraded/ >> / 3799 stale+active+clean/ >> >> *OSD : Logs * >> >> /2015-03-09 12:22:16.312774 7f760dac9700 -1 common/Thread.cc >> <http://Thread.cc>: In function 'void Thread::create(size_t)' thread >> 7f760dac9700 time 2015-03-09 12:22:16.311970/ >> /common/Thread.cc <http://Thread.cc>: 129: FAILED assert(ret == 0)/ >> / >> / >> / ceph version 0.80.8 (69eaad7f8308f21573c604f121956e64679a52a7)/ >> / 1: (Thread::create(unsigned long)+0x8a) [0xaf41da]/ >> / 2: (SimpleMessenger::add_accept_pipe(int)+0x6a) [0xae84fa]/ >> / 3: (Accepter::entry()+0x265) [0xb5c635]/ >> / 4: /lib64/libpthread.so.0() [0x3c8a6079d1]/ >> / 5: (clone()+0x6d) [0x3c8a2e89dd]/ >> / NOTE: a copy of the executable, or `objdump -rdS <executable>` is >> needed to interpret this./ >> >> >> *More information at Ceph Tracker Issue : >> *http://tracker.ceph.com/issues/10988#change-49018 >> >> >> **************************************************************** >> Karan Singh >> Systems Specialist , Storage Platforms >> CSC - IT Center for Science, >> Keilaranta 14, P. O. Box 405, FIN-02101 Espoo, Finland >> mobile: +358 503 812758 >> tel. +358 9 4572001 >> fax +358 9 4572302 >> http://www.csc.fi/ >> **************************************************************** >> >> >> >> _______________________________________________ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > > > -- > Christian Eichelmann > Systemadministrator > > 1&1 Internet AG - IT Operations Mail & Media Advertising & Targeting > Brauerstraße 48 · DE-76135 Karlsruhe > Telefon: +49 721 91374-8026 > christian.eichelm...@1und1.de > > Amtsgericht Montabaur / HRB 6484 > Vorstände: Henning Ahlert, Ralph Dommermuth, Matthias Ehrlich, Robert > Hoffmann, Markus Huhn, Hans-Henning Kettler, Dr. Oliver Mauss, Jan Oetjen > Aufsichtsratsvorsitzender: Michael Scheeren
smime.p7s
Description: S/MIME cryptographic signature
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com