Mike Erdely wrote on Fri, Dec 01, 2006 at 12:07:19PM -0500: > I'm running jabberd2 from ports on an 4.0-release+patches, P4 2GHz, 1 GB > RAM box for ~50 users. This box is running nothing but jabber & mysql. > Jabber is configured to use the local mysql (its only purpose is jabber) > for storage and LDAPS for authentication. > > During the work week jabber seems to be working fine and then at some > point people cannot log in. People who are already logged in do not seem > to have a problem. A few seconds later, the c2s process is taking all of > the CPU.
Could you check whether jabberd2 also uses large numbers of file descriptors? There is a file descriptor leak in jabberd2; it is triggered by SSL handshake errors. Such errors have been seen in the wild, in particular in s2s with the host jabjab.de. Another file descriptor leak - or rather, probably the same one - had also been reported in the jabberd2 bug tracking system: http://j2.openaether.org/bugzilla/show_bug.cgi?id=23 Because jabberd2 will indefinitely retry to use these broken connections, i could well imagine that it will eventually overload your CPU. I didn't check, though; i never saw a point in increasing kern.maxfiles because my collegue Klara quickly realised that the file descriptor leak is the cause of the problem we saw. Klara prepared a patch fixing the file descriptor leak. She submitted it upstream but never got any response. So i converted it to a patch for the ports tree and sent it to the MAINTAINER. It bounced. So I contacted the maintainer via another mail adress he has been using. I never heard back from him. So i submitted the patch here, but nobody replied, and i am not aware of any related commits. It seems jabberd2 is rarely used. All the same, i will now update my -current build machine, check whether the patch is still ok and resubmit. It will take some time, i'm now at home, my build machine is at the office, and my build machine is not fast, so please be patient. > Any ideas? Anyone else seeing c2s spike at 100% CPU on an almost > daily basis? All this is now several months ago, but i dimly recall we saw jabberd2 processes eating most of the CPU, too, though it never got as bad as locking up the system (PIII 635MHz, 256 MD RAM). I'm still sure about the following: When killing the processes holding the leaked file descriptors, the CPU did go to 100% for several seconds, and during that time, the system was not responsive. But in all cases, the processes finally died off and released all resourced they held. We use the following settings: [EMAIL PROTECTED] # grep max_fds /etc/jabberd/* c2s.xml: <max_fds>128</max_fds> router.xml: <max_fds>128</max_fds> Besides, we are running mysqld with --open-files-limit=256, and it is used by SpamAssassin, Joomla, and Mediawiki, too. The rest of the related settings are at the defaults; _jabberd and _mysql are login class daemon with :openfiles-cur=128:. In any case, i cannot believe you actually need kern.maxfiles=10240 or :openfiles-cur=4096: or anything like that to serve 50 users.
