On Tue, 31 Jan 2012, Kris Deugau wrote:
John Hardin wrote:
On Tue, 31 Jan 2012, John Hardin wrote:
> You posted this command line:
>
> /usr/local/bin/spamd -d -x -q -r /var/run/spamd.pid --min-children=59
> --min-spare=1 --max-spare=1 --max-conn-per-child=100 -m 60 -s local1
> -u spamd --timeout-child=60 -i 0.0.0.0 -A <IP list> --syslog-ident
> spamd/main
>
> Why don't we see something like "prefork: child states:
> BIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII"?
A good question. After a major burst of mail has bumped up the number of
children, yes, I *do* see something like that in the log.
Yeah, before it starts pruning idle children.
But most of the time, there are 6-8 children running.
...to answer my own question, max-spare overrides it?
Hm, I wondered if that might be the case.
Can't see what else would.
Is there some reason you're setting max-spare to 1 instead of leaving it
at 2 or setting it to something like 5?
I think the *idea* was to prespawn all the child processes (several
generations of hardware ago, and ~SA3.0.x [possibly 2.6], there was evidence
that spamd wasn't (re)spawning new children fast enough to keep up with
*normal* mail flow, never mind spikes). Watching the logs now is enough
evidence that spamd is coping quite properly with scaling up as needed, so
we're probably overspecifying its behaviour and could drop all but the -m 60.
I've bumped --max-spare to 5 on one system just because.
That was just a stab in the dark. From your log excerpt SA seems to be
getting wedged when bringing active the last idle child rather than when
adding new children. It would be interesting to see if it wedges in the
new config when idle child number 5 is needed...
OTOH... I'm having trouble seeing where this could cause the whole spamd
process tree to lock up completely for ~15 minutes. (It locks hard enough
that a monitoring process gets "connection timed out", even though there are
only 5 spamd children running - as per an incident just after noon today.)
So there were 5 busy children and one idle, and it wedged, and when it
recovered it started adding children?
--min-spare 5 would be something else to try.
On the third hand... if there *is* a subtle bug in spamd's process scaling,
is it worth jumping *way* out there and trying --round-robin?
If this is a dedicated mail server, why not? round robin on (say) 16
children with your glue concurrency limit set to 16 as well.
--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhar...@impsec.org FALaholic #11174 pgpk -a jhar...@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
So Microsoft's invented the ASCII equivalent to ugly ink spots that
appear on your letter when your pen is malfunctioning.
-- Greg Andrews, about Microsoft's way to encode apostrophes
-----------------------------------------------------------------------
Tomorrow: the 9th anniversary of the loss of STS-107 Columbia