sounds like a new ticket is in order, alright. btw if *is* load-related,
an "strace -f -ttt" log will show that pretty clearly.

--j.

Daryl C. W. O'Shea writes:
> (copying Justin since this has to do with pre-forking)
> 
> Dan Mahoney, System Admin wrote:
> > On Fri, 10 Mar 2006, Daryl C. W. O'Shea wrote:
> > 
> >> On 3/10/2006 11:22 AM, Dan Mahoney, System Admin wrote:
> 
> > Okay,
> > 
> > I'm still getting these issues.  I've corrected every other issue that's 
> > plagued us, and the thing still locks up.  USUALLY when a user gets some 
> > form of dictionary spam.  For the users I can identify I've been keeping 
> > copies of their stuff.
> > 
> > NOTE: This is under a stock 3.1.1, if there are any other patches I 
> > should be using from the previous conversations that are NOT in 3.1.1, 
> > please let me know, and I'll make sure I have those too.  I'm seeing 
> > lots of the following:
> > 
> > Mar 30 21:52:14 quark spamd[45835]: __alarm__
> > Mar 30 21:52:14 quark spamd[45835]: __alarm__
> > Mar 30 21:52:14 quark spamd[45835]: spamd: copy_config timeout (with 
> > empty $@), respawning child process after 25 messages at 
> > /usr/local/bin/spamd line 982.
> > Mar 30 21:52:16 quark spamd[52479]: __alarm__
> > Mar 30 21:52:16 quark spamd[52479]: __alarm__
> > Mar 30 21:52:16 quark spamd[52479]: spamd: copy_config timeout (with 
> > empty $@), respawning child process after 9 messages at 
> > /usr/local/bin/spamd line 982.
> 
> This indicates that the patch from bug 4699 is working -- spamd now 
> recognizes that the alarm timed out on copy_config.
> 
> 
> > And also some of this:
> > 
> > Mar 30 21:52:31 quark spamd[42292]: syswrite() on closed filehandle 
> > GEN88 at /usr/local/lib/perl5/5.8.6/mach/IO/Handle.pm line 451.
> > Mar 30 21:52:31 quark spamd[42292]: Use of uninitialized value in 
> > concatenation (.) or string at 
> > /usr/local/lib/perl5/site_perl/5.8.6/Mail/SpamAssassin/SpamdForkScaling.pm 
> > line 330.
> > Mar 30 21:52:31 quark spamd[42292]: prefork: write of ping failed to 
> > 52479 fd=:  at 
> > /usr/local/lib/perl5/site_perl/5.8.6/Mail/SpamAssassin/SpamdForkScaling.pm 
> > line 330.
> > Mar 30 21:52:31 quark spamd[42292]: Use of uninitialized value in 
> > concatenation (.) or string at 
> > /usr/local/lib/perl5/site_perl/5.8.6/Mail/SpamAssassin/SpamdForkScaling.pm 
> > line 127.
> > Mar 30 21:52:31 quark spamd[42292]: prefork: killing failed child 52479 
> > fd= at 
> > /usr/local/lib/perl5/site_perl/5.8.6/Mail/SpamAssassin/SpamdForkScaling.pm 
> > line 127.
> > Mar 30 21:52:31 quark spamd[42292]: prefork: killed child 52479 at 
> > /usr/local/lib/perl5/site_perl/5.8.6/Mail/SpamAssassin/SpamdForkScaling.pm 
> > line 141.
> > Mar 30 21:52:31 quark spamd[42292]: syswrite() on closed filehandle 
> > GEN70 at /usr/local/lib/perl5/5.8.6/mach/IO/Handle.pm line 451.
> > Mar 30 21:52:31 quark spamd[42292]: Use of uninitialized value in 
> > concatenation (.) or string at 
> > /usr/local/lib/perl5/site_perl/5.8.6/Mail/SpamAssassin/SpamdForkScaling.pm 
> > line 330.
> > Mar 30 21:52:31 quark spamd[42292]: prefork: write of ping failed to 
> > 45835 fd=:  at 
> > /usr/local/lib/perl5/site_perl/5.8.6/Mail/SpamAssassin/SpamdForkScaling.pm 
> > line 330.
> > Mar 30 21:52:31 quark spamd[42292]: Use of uninitialized value in 
> > concatenation (.) or string at 
> > /usr/local/lib/perl5/site_perl/5.8.6/Mail/SpamAssassin/SpamdForkScaling.pm 
> > line 127.
> > Mar 30 21:52:31 quark spamd[42292]: prefork: killing failed child 45835 
> > fd= at 
> > /usr/local/lib/perl5/site_perl/5.8.6/Mail/SpamAssassin/SpamdForkScaling.pm 
> > line 127.
> > Mar 30 21:52:31 quark spamd[42292]: prefork: killed child 45835 at 
> > /usr/local/lib/perl5/site_perl/5.8.6/Mail/SpamAssassin/SpamdForkScaling.pm 
> > line 141.
> 
> This indicates that the child is exiting, but SpamdForkScaling doesn't 
> know about it until a ping fails 150 seconds later, so a new child isn't 
> spawned for a long time after one of them commits suicide.
> 
> 
> > Example at or around Mar 30 01:48:16 in this file:
> > 
> > http://www.gushi.org/maillog33106-0.txt
> > 
> > And another similar lockup at Mar 30 21:49:50 -- SAME USER, go figure.
> > 
> > I don't have archived copies of this user's mail -- yet.  I've set up 
> > archiving for them, and we have everything from now forward, but I'm 
> > convinced there's SOMETHING in the spam they're getting that causes a 
> > lockup.
> 
> I think it's actually load related... spamd is timing out the 
> copy_config sooner than it's really taking under high load.  If you were 
> to change the alarm value from 10 to 100 or so, around spamd line 949 
> this may go away.
> 
> Any idea what sort of load averages you've got when this starts to 
> happen?  It looks like it starts off with a couple children timing out, 
> then you become short on children, mail starts stacking up, and it 
> snowballs from there.
> 
> 
> BTW, we should probably find or open a bugzilla ticket for this.  Bug 
> 4699 is related.  The pre-fork issue is probably another bug of its own.
> 
> 
> Daryl

Reply via email to