We've been seeing a problem with spamd that happens at random times. Occasionally, a spamd thread will spin, clocking up CPU time and never finish. This causes other spamd processes to hang and eventually all memory and swap is used up by multiple spamd and sendmail processes. We've set limits on the number of threads of spamd and sendmail but that doesn't help. Eventually everything stops, either because their are no resources available or there are no more threads available and they're all waiting for some to exit, which never happens.
We've determined that it's the Bayes DB but are not sure why. This happens on a RH 6.2 system with Perl 5.6.1 and a Gentoo (2.4.20 kernel) with Perl 5.8. Below is the output (very shortened) of strace and then lsof of a spamd process that exhibited this behaviour:
ps -ef | grep 19987
smmsp 19987 19682 21 12:50 ? 00:01:38 /usr/bin/spamd -d -x -u smmsp -m
strace -p 19987
pread(29, "\0\0\0\0\1\0\0\0\227\26\0\0\355\24\0\0\0\0\0\0\26\0<\17"..., 4096, 23687168) = 4096
pread(29, "\0\0\0\0\1\0\0\0\356\24\0\0\0\0\0\0\0\0\0\0r\1\5\3\0\2"..., 4096, 21946368) = 4096
pread(29, "\0\0\0\0\1\0\0\0\357\24\0\0\0\0\0\0\225\37\0\0l\1\365\2"..., 4096, 21950464) = 4096
pread(29, "\0\0\0\0\1\0\0\0\225\37\0\0\357\24\0\0\0\0\0\0\n\0\246"..., 4096, 33116160) = 4096
pread(29, "\0\0\0\0\1\0\0\0\360\24\0\0\0\0\0\0\0\0\0\0Z\1G\3\0\2\371"..., 4096, 21954560) = 4096
pread(29, "\0\0\0\0\1\0\0\0\361\24\0\0\0\0\0\0W\26\0\0t\1\n\3\0\2"..., 4096, 21958656) = 4096
pread(29, "\0\0\0\0\1\0\0\0W\26\0\0\361\24\0\0\0\0\0\0\10\0\271\17"..., 4096, 23425024) = 4096
pread(29, "\0\0\0\0\1\0\0\0\362\24\0\0\0\0\0\0G\17\0\0n\1\3\3\0\2"..., 4096, 21962752) = 4096
pread(29, "\0\0\0\0\1\0\0\0G\17\0\0\362\24\0\0\0\0\0\0\"\0\303\16"..., 4096, 16019456) = 4096
pread(29, "\0\0\0\0\1\0\0\0\363\24\0\0\0\0\0\0\22\27\0\0p\1\376\2"..., 4096, 21966848) = 4096
pread(29, "\0\0\0\0\1\0\0\0\22\27\0\0\363\24\0\0\0\0\0\0\32\0\27\17"..., 4096, 24190976) = 4096
pread(29, "\0\0\0\0\1\0\0\0\364\24\0\0\0\0\0\0\23\27\0\0`\1\340\2"..., 4096, 21970944) = 4096
pread(29, "\0\0\0\0\1\0\0\0\23\27\0\0\364\24\0\0\0\0\0\0 \0\300\16"..., 4096, 24195072) = 4096
[...]
lsof -p 19987
COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
spamd 19987 smmsp cwd DIR 3,1 4096 2 /
spamd 19987 smmsp rtd DIR 3,1 4096 2 /
spamd 19987 smmsp txt REG 3,1 797576 274221 /usr/bin/perl
spamd 19987 smmsp mem REG 3,1 89547 354851 /lib/ld-2.2.5.so
spamd 19987 smmsp mem REG 3,1 89424 354868 /lib/libnsl-2.2.5.so
spamd 19987 smmsp mem REG 3,1 12102 354864 /lib/libdl-2.2.5.so
[...]
spamd 19987 smmsp 0r CHR 1,3 33986 /dev/null
spamd 19987 smmsp 1w CHR 1,3 33986 /dev/null
spamd 19987 smmsp 2w CHR 1,3 33986 /dev/null
spamd 19987 smmsp 3r REG 3,1 55696 274954 /usr/bin/spamd
spamd 19987 smmsp 4u unix 0xd3e100c0 2882486 socket
spamd 19987 smmsp 5u IPv4 2903575 UDP *:59067
spamd 19987 smmsp 6u unix 0xc8f1c580 2902922 /var/run/spamd.sock
spamd 19987 smmsp 7u IPv4 2903576 UDP *:59068
spamd 19987 smmsp 8u IPv4 2903577 UDP *:59069
spamd 19987 smmsp 9u IPv4 2903578 UDP *:59070
spamd 19987 smmsp 10w FIFO 0,5 2902923 pipe
spamd 19987 smmsp 11u IPv4 2903579 UDP *:59071
spamd 19987 smmsp 12u IPv4 2903580 UDP *:59072
spamd 19987 smmsp 13u IPv4 2903581 UDP *:59073
spamd 19987 smmsp 14u IPv4 2903582 UDP *:59074
spamd 19987 smmsp 15u IPv4 2903583 UDP *:59075
spamd 19987 smmsp 16u IPv4 2903584 UDP *:59076
spamd 19987 smmsp 17u IPv4 2903585 UDP *:59077
spamd 19987 smmsp 18u IPv4 2903586 UDP *:59078
spamd 19987 smmsp 19u IPv4 2903587 UDP *:59149
spamd 19987 smmsp 20u IPv4 2903588 UDP *:59150
spamd 19987 smmsp 21u IPv4 2903589 UDP *:59151
spamd 19987 smmsp 22u IPv4 2903590 UDP *:59152
spamd 19987 smmsp 23u IPv4 2903591 UDP *:59153
spamd 19987 smmsp 24u IPv4 2903592 UDP *:59154
spamd 19987 smmsp 25u IPv4 2903593 UDP *:59155
spamd 19987 smmsp 26u IPv4 2903594 UDP *:59156
spamd 19987 smmsp 27u IPv4 2903595 UDP *:59157
spamd 19987 smmsp 28u IPv4 2903596 UDP *:59158
spamd 19987 smmsp 29u REG 3,2 33841152 48106 /var/spool/spamassassin/bayesDB/bayes_toks
spamd 19987 smmsp 30u REG 3,2 10887168 48107 /var/spool/spamassassin/bayesDB/bayes_seen
Note that strace shows spamd doing preads on fd 29 which is the bayes_toks file. I've tried running gdb on a spamd process exhibiting the same behaviour but haven't gleaned any more information as to what is happening. We also tried using the -D switch to spamd but that seemed to make things more unstable and appeared to cause problems with the milter. But the latter could have been coincidence.
Does anyone have any ideas or suggestions for other diagnostics that we can try?
TIA, Bob -- Bob Amen O'Reilly & Associates, Inc. http://www.ora.com/ http://www.oreilly.com/
------------------------------------------------------- This SF. Net email is sponsored by: GoToMyPC GoToMyPC is the fast, easy and secure way to access your computer from any Web browser or wireless device. Click here to Try it Free! https://www.gotomypc.com/tr/OSDN/AW/Q4_2003/t/g22lp?Target=mm/g22lp.tmpl _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk