[SAtalk] spamc/spamd hanging (was spamd dying)

James B. Huber Wed, 15 May 2002 10:07:22 -0700

Speaking of dying.....
   We have been seeing a problem with spamc/spamd just getting
HUNG on both Solaris-2.8 and Redhat-7.2


   I have already applied the patch for spamc hangs due to
large message size (although that's never been what hangs things)
so that's not what's going on here.

   In a nutshell, spamc is stuck waiting on a read, that never
completes (even after DAYS). Looking over the code it would
appear that:
1) spamc doesn't ever catch (yet alone handle) a SIGPIPE,
   so if something goes wrong on the "pipe" it will be left
   "stuck".
2) Even though the "milter" (using sendmail 8.12.3) takes timeout
   parameters, they have zero effect on anything.
3) There needs to be some configurable "gross error" timeout
   on the spamc/spamd processes.

I've currently got 3 of these "stuck" on a very low volume
SUN-Solaris-2.8 machine (it's really BAD on a high volume gateway),
below is the output of a "truss" against one of stuck spamc
processes, anyone have any suggestions or ideas ?

Executing "truss -p 14538" as root.
     *** SUID: ruid/euid/suid = 0 / 25 / 25  ***
read(0, 0x000261B0, 256240)     (sleeping...)

It never comes home from Kansas.....

Thanks in Advance,
Jim

On 2002.05.15 09:12 Michael Stenner wrote:
> On Wed, May 15, 2002 at 11:38:02AM +0200, Gilles Nedostoupof wrote:
> > > Here's the patch I dreamed up.  If there are no objections, I'll
> > > submit this with a bug report.
> > > I've tested it here and it certainly solves my problem.
> 
> > I'm sorry but this is not solving my problem, I've patched
> spamd/spamd.raw,
> > recompiled SA.
> > When I do a /etc/rc.d/init.d/syslog restart ; spamd is stopping
> working :(
> >
> > Here's a part of my /var/log/maillog :
> 
> < snip >
> 
> > May 15 11:06:45 john spamd[19871]: clean message (4/5) for
> (unknown):500 in
> > 5 seconds.
> > May 15 11:06:45 john spamd[19871]: SIGPIPE received - reopening log
> socket
> > May 15 11:06:45 john spamd[19870]: clean message (4/5) for
> (unknown):500 in
> > 5 seconds.
> > May 15 11:06:45 john spamd[19870]: SIGPIPE received - reopening log
> socket
> > May 15 11:08:09 john sophie[19861]: Sophie child has timed-out (no
> data
> > received in 90 seconds) - process killed
> > May 15 11:08:28 john spamd[19938]: server killed by SIGTERM,
> shutting down
> > May 15 11:08:28 john spamd[19938]: SIGPIPE received - reopening log
> socket
> > May 15 11:08:28 john spamd[19869]: server killed by SIGTERM,
> shutting down
> > May 15 11:08:28 john spamd[19869]: SIGPIPE received - reopening log
> socket
> >
> > 11:08:28 I shut down spamd after waiting some time; then the process
> > continue...
> 
> All right... brainstorming here.
> 
> 1) The SIGPIPE lines are my work.  Without the patch, spamd probably
>    would have died on the first SIGPIPE.
> 
> 2) The SIGTERM is probably you trying to kill it.  How are you killing
>    it?  Are you sure you're killing the parent?
> 
> 3) The patch that I submitted only allows spamd to stay alive after a
>    syslog-related SIGPIPE and continue to log.  It doesn't have
>    anything to do (one way or the other) with the actual processing of
>    mail, so you probably have something else going on too.
> 
>    In my case, mail processing was completely unaffected by the "bug",
>    except for that pesky detail about spamd dying a painful death.
> 
> 4) It looks like the processes logging here are all children, judging
>    by the log contents, and by the pids.  Where are the parent's logs?
> 
>    What I see is not inconsistent with what I did.  The children
>    inherit filehandles from parents.  So if a handle gets screwed up
>    after several children are forked, I'm not surprised if all of them
>    need to reset the handle.  In contrast, if they are forked _after_
>    the handle is screwed up (and parent resets it) then they should be
>    OK.
> 
> 5) It looks like you may be getting a lot of mail (given you have
>    several children running at once).  You might try backing off a bit
>    for testing, to see what happens.  Who knows... maybe I still have
>    your problem but don't see it due to low volume.
> 
>    If you must, you can try invoking spamd via spamc from a single
>    account's procmail, or just by running spamc directly on the
>    command line.  That way, you can scale up in volume exactly as you
>    like.
> 
>    (Of course, we shouldn't rule out the possibility that the way
>    you're invoking it is involved.)
> 
> 
> No answers, but this should keep you busy :)
> 
>                                       -Michael
> --
>   Michael Stenner                       Office Phone: 919-660-2513
>   Duke University, Dept. of Physics       [EMAIL PROTECTED]
>   Box 90305, Durham N.C. 27708-0305
> 
> _______________________________________________________________
> 
> Have big pipes? SourceForge.net is looking for download mirrors. We
> supply
> the hardware. You get the recognition. Email Us:
> [EMAIL PROTECTED]
> _______________________________________________
> Spamassassin-talk mailing list
> [EMAIL PROTECTED]
> https://lists.sourceforge.net/lists/listinfo/spamassassin-talk
> 

-- 
======================================================================
James B. Huber                                          [EMAIL PROTECTED]
======================================================================

_______________________________________________________________

Have big pipes? SourceForge.net is looking for download mirrors. We supply
the hardware. You get the recognition. Email Us: [EMAIL PROTECTED]
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

[SAtalk] spamc/spamd hanging (was spamd dying)

Reply via email to