On Tue, 31 Jan 2012, Kris Deugau wrote:

John Hardin wrote:
 On Tue, 31 Jan 2012, John Hardin wrote:
>  You posted this command line:
> > /usr/local/bin/spamd -d -x -q -r /var/run/spamd.pid --min-children=59
>  --min-spare=1 --max-spare=1 --max-conn-per-child=100 -m 60 -s local1
>  -u spamd --timeout-child=60 -i 0.0.0.0 -A <IP list> --syslog-ident
>  spamd/main
> > Why don't we see something like "prefork: child states:
>  BIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII"?

A good question. After a major burst of mail has bumped up the number of children, yes, I *do* see something like that in the log.

Yeah, before it starts pruning idle children.

But most of the time, there are 6-8 children running.

 ...to answer my own question, max-spare overrides it?

Hm, I wondered if that might be the case.

Can't see what else would.

 Is there some reason you're setting max-spare to 1 instead of leaving it
 at 2 or setting it to something like 5?

I think the *idea* was to prespawn all the child processes (several generations of hardware ago, and ~SA3.0.x [possibly 2.6], there was evidence that spamd wasn't (re)spawning new children fast enough to keep up with *normal* mail flow, never mind spikes). Watching the logs now is enough evidence that spamd is coping quite properly with scaling up as needed, so we're probably overspecifying its behaviour and could drop all but the -m 60.

I've bumped --max-spare to 5 on one system just because.

That was just a stab in the dark. From your log excerpt SA seems to be getting wedged when bringing active the last idle child rather than when adding new children. It would be interesting to see if it wedges in the new config when idle child number 5 is needed...

OTOH... I'm having trouble seeing where this could cause the whole spamd process tree to lock up completely for ~15 minutes. (It locks hard enough that a monitoring process gets "connection timed out", even though there are only 5 spamd children running - as per an incident just after noon today.)

So there were 5 busy children and one idle, and it wedged, and when it recovered it started adding children?

--min-spare 5 would be something else to try.

On the third hand... if there *is* a subtle bug in spamd's process scaling, is it worth jumping *way* out there and trying --round-robin?

If this is a dedicated mail server, why not? round robin on (say) 16 children with your glue concurrency limit set to 16 as well.

--
 John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
 jhar...@impsec.org    FALaholic #11174     pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
  So Microsoft's invented the ASCII equivalent to ugly ink spots that
  appear on your letter when your pen is malfunctioning.
         -- Greg Andrews, about Microsoft's way to encode apostrophes
-----------------------------------------------------------------------
 Tomorrow: the 9th anniversary of the loss of STS-107 Columbia

Reply via email to