John Hardin wrote:
That was just a stab in the dark. From your log excerpt SA seems to be
getting wedged when bringing active the last idle child rather than when
adding new children. It would be interesting to see if it wedges in the
new config when idle child number 5 is needed...

The particulars of how many active children there are, and how many are busy, varies from incident to incident; IIRC I've seen it get stuck with 6 idle children and 2 busy. :/

OTOH... I'm having trouble seeing where this could cause the whole
spamd process tree to lock up completely for ~15 minutes. (It locks
hard enough that a monitoring process gets "connection timed out",
even though there are only 5 spamd children running - as per an
incident just after noon today.)

So there were 5 busy children and one idle, and it wedged, and when it
recovered it started adding children?

.. or 4 and 2, or 3 and 5, or 2 and 3, or....  :/

--min-spare 5 would be something else to try.

I've bumped up min-spare and max-spare on one machine, we'll see how it goes.

On the third hand... if there *is* a subtle bug in spamd's process
scaling, is it worth jumping *way* out there and trying --round-robin?

If this is a dedicated mail server, why not? round robin on (say) 16
children with your glue concurrency limit set to 16 as well.

After a quick experiment on a test system, I'm going to leave this alone for now; the master spamd does not log those "child state" lines on each connection as it does now. :( I may try to hack up a patch.

-kgd

Reply via email to