sounds like a new ticket is in order, alright. btw if *is* load-related, an "strace -f -ttt" log will show that pretty clearly.
--j. Daryl C. W. O'Shea writes: > (copying Justin since this has to do with pre-forking) > > Dan Mahoney, System Admin wrote: > > On Fri, 10 Mar 2006, Daryl C. W. O'Shea wrote: > > > >> On 3/10/2006 11:22 AM, Dan Mahoney, System Admin wrote: > > > Okay, > > > > I'm still getting these issues. I've corrected every other issue that's > > plagued us, and the thing still locks up. USUALLY when a user gets some > > form of dictionary spam. For the users I can identify I've been keeping > > copies of their stuff. > > > > NOTE: This is under a stock 3.1.1, if there are any other patches I > > should be using from the previous conversations that are NOT in 3.1.1, > > please let me know, and I'll make sure I have those too. I'm seeing > > lots of the following: > > > > Mar 30 21:52:14 quark spamd[45835]: __alarm__ > > Mar 30 21:52:14 quark spamd[45835]: __alarm__ > > Mar 30 21:52:14 quark spamd[45835]: spamd: copy_config timeout (with > > empty $@), respawning child process after 25 messages at > > /usr/local/bin/spamd line 982. > > Mar 30 21:52:16 quark spamd[52479]: __alarm__ > > Mar 30 21:52:16 quark spamd[52479]: __alarm__ > > Mar 30 21:52:16 quark spamd[52479]: spamd: copy_config timeout (with > > empty $@), respawning child process after 9 messages at > > /usr/local/bin/spamd line 982. > > This indicates that the patch from bug 4699 is working -- spamd now > recognizes that the alarm timed out on copy_config. > > > > And also some of this: > > > > Mar 30 21:52:31 quark spamd[42292]: syswrite() on closed filehandle > > GEN88 at /usr/local/lib/perl5/5.8.6/mach/IO/Handle.pm line 451. > > Mar 30 21:52:31 quark spamd[42292]: Use of uninitialized value in > > concatenation (.) or string at > > /usr/local/lib/perl5/site_perl/5.8.6/Mail/SpamAssassin/SpamdForkScaling.pm > > line 330. > > Mar 30 21:52:31 quark spamd[42292]: prefork: write of ping failed to > > 52479 fd=: at > > /usr/local/lib/perl5/site_perl/5.8.6/Mail/SpamAssassin/SpamdForkScaling.pm > > line 330. > > Mar 30 21:52:31 quark spamd[42292]: Use of uninitialized value in > > concatenation (.) or string at > > /usr/local/lib/perl5/site_perl/5.8.6/Mail/SpamAssassin/SpamdForkScaling.pm > > line 127. > > Mar 30 21:52:31 quark spamd[42292]: prefork: killing failed child 52479 > > fd= at > > /usr/local/lib/perl5/site_perl/5.8.6/Mail/SpamAssassin/SpamdForkScaling.pm > > line 127. > > Mar 30 21:52:31 quark spamd[42292]: prefork: killed child 52479 at > > /usr/local/lib/perl5/site_perl/5.8.6/Mail/SpamAssassin/SpamdForkScaling.pm > > line 141. > > Mar 30 21:52:31 quark spamd[42292]: syswrite() on closed filehandle > > GEN70 at /usr/local/lib/perl5/5.8.6/mach/IO/Handle.pm line 451. > > Mar 30 21:52:31 quark spamd[42292]: Use of uninitialized value in > > concatenation (.) or string at > > /usr/local/lib/perl5/site_perl/5.8.6/Mail/SpamAssassin/SpamdForkScaling.pm > > line 330. > > Mar 30 21:52:31 quark spamd[42292]: prefork: write of ping failed to > > 45835 fd=: at > > /usr/local/lib/perl5/site_perl/5.8.6/Mail/SpamAssassin/SpamdForkScaling.pm > > line 330. > > Mar 30 21:52:31 quark spamd[42292]: Use of uninitialized value in > > concatenation (.) or string at > > /usr/local/lib/perl5/site_perl/5.8.6/Mail/SpamAssassin/SpamdForkScaling.pm > > line 127. > > Mar 30 21:52:31 quark spamd[42292]: prefork: killing failed child 45835 > > fd= at > > /usr/local/lib/perl5/site_perl/5.8.6/Mail/SpamAssassin/SpamdForkScaling.pm > > line 127. > > Mar 30 21:52:31 quark spamd[42292]: prefork: killed child 45835 at > > /usr/local/lib/perl5/site_perl/5.8.6/Mail/SpamAssassin/SpamdForkScaling.pm > > line 141. > > This indicates that the child is exiting, but SpamdForkScaling doesn't > know about it until a ping fails 150 seconds later, so a new child isn't > spawned for a long time after one of them commits suicide. > > > > Example at or around Mar 30 01:48:16 in this file: > > > > http://www.gushi.org/maillog33106-0.txt > > > > And another similar lockup at Mar 30 21:49:50 -- SAME USER, go figure. > > > > I don't have archived copies of this user's mail -- yet. I've set up > > archiving for them, and we have everything from now forward, but I'm > > convinced there's SOMETHING in the spam they're getting that causes a > > lockup. > > I think it's actually load related... spamd is timing out the > copy_config sooner than it's really taking under high load. If you were > to change the alarm value from 10 to 100 or so, around spamd line 949 > this may go away. > > Any idea what sort of load averages you've got when this starts to > happen? It looks like it starts off with a couple children timing out, > then you become short on children, mail starts stacking up, and it > snowballs from there. > > > BTW, we should probably find or open a bugzilla ticket for this. Bug > 4699 is related. The pre-fork issue is probably another bug of its own. > > > Daryl