Re: Optimum Number of Spamd Children

Kris Deugau Thu, 06 Jun 2019 10:41:29 -0700

RW wrote:

On Wed, 5 Jun 2019 10:45:13 -0400
Kris Deugau wrote:

jim.ander...@wohosting.net wrote:

Greetings,

I've searched but haven't had any luck finding documentation about
how to determine the optimal settings for spamd children
(max-children, min-children, max-spare, min-spare, and
max-conn-per-child). I have a dedicated server for running spamd.
It has 6GB (can add more) and 6 cores. What would be the best
settings? Or how would I determine the best settings?


"Try it and see."  :/

At a minimum you'll want to make sure that you don't spawn more spamd
children than you can keep in RAM;  watch your system for a while,
take the worst-case spamd memory footprint, and divide that into your
physical RAM to find the absolute largest max-children you should
use.


You can get a more accurate feeling for that limit by stress-testing
and watching-out for significant swap I/O, but if it turns-out to be
relevant more memory may be needed.

What I did was measure the CPU limited throughput without network
tests, and then calculate the number of children needed to sustain that
throughput with a scan time on the high end of those seen.

It's a good idea to check that you can actually reach full CPU usage
and aren't running into an avoidable locking bottleneck with Bayes etc.

*nod* If you're having to fine-tune any of these to keep the systemfully busy but not overloading, that's another key factor to watch; youcan allow more processes than CPU cores, but unless your DNS resolver isslow, not by much. Hyperthreading may give you a bit more slack but Idon't think a HT "core" really gives you a full CPU core of benefit fora workload like SA. On top of which you have the growing list ofsecurity issues just having it enabled. :/

We've also found that it's best to set max-children to
min-children+1, and max/min-spare to 1.  It may have been improved
since we last reviewed our settings in detail, but at the time spamd
did't spawn new children fast enough under load spikes,


Despite what it says in the documentation there isn't an actual rate
limit. What happens is that above 'min-free' processes the number of
children only gets incremented when a child becomes idle after
completing a scan or initializing. So the worst case is that there is a
delay of the time it takes to scan one message. Once a scan completes
the number of children can jump to 'max-free' instantaneously.

My memory is a little hazy on the specifics; it was ~8+ years ago IIRCwhen I was seeing problems and experimented with settings to avoid them.I don't recall any documentation regarding a *defined* limit on thechild spawn rate, but in live testing at the time there certainly was one.

We were seeing on the order of 30-60s to scale up; think "singlemessage with huge CC list", where SA is called on final delivery foreach recipient. I don't recall offhand if it was "spawn new childprocess, wait, spawn, wait, etc" or if it was a burst as you say above,but the ultimate result was a lot of mail suddenly stuck in the inboundmail queue waiting for delivery. Prespawning the maximum number ofchild processes "fixed" the problem. In the worst cases IIRC it took upto about 30 minutes to clear the backlog.

It might not be required any more but I haven't seen any issuescontinuing to do so.

We were also having issues at the time with pathological spam usinggigantic (>200K) HTML comments causing severe slowdowns, so scantime onany given spam sometimes averaged 15-20s if it returned at all. Weadded a second spamd instance relying almost solely on a subset of DNSrules plus a handful of local rules that we tuned to skim these off first.

The worst case happens when the system is completely idle when the
spike arrives, so it's something that's more likely to be seen in
testing than on a busy system.

Our two scanning nodes are nearly idle most of the time (usually up toabout 5 active children of 70 in the main SA instance) but if a bigburst of mail comes in, it can hit the current limits we have set. I'draise them but I think at this point we'd hit a CPU limit instead; wedo not have 70 CPU cores available on these systems, and they're alsorunning ClamAV and a separate spamd instance for outbound mail scanning.

We've also scaled to allow for load balancing and taking a node downwithout impacting operations; while we're heavily overprovisioned forthe average load, we're still a bit tight at peak load.

Setting them equal caused a deadlock of some kind IIRC.


Is there a bug report for that?

It was quite a while ago (possibly as far back as SA 3.2 - wasn't therea new forking pattern introduced around then?). I'll see if I canreproduce it with current release or trunk versions.


-kgd

Re: Optimum Number of Spamd Children

Reply via email to