On Wed, 01 Feb 2012 16:56:34 -0500 Kris Deugau wrote: > RW wrote: > > On Wed, 1 Feb 2012 17:36:36 +0000 > > RW wrote: > > > >> spamd adjusts the number of children when it gets a message from > >> one of them reporting that it's idle. If you have all the busy > >> children locked-up, and have N idle children and then exactly N > >> messages that trigger the buggy regex come in at the same time, > >> you lock-up all the children and wont get any new idle events. > >> > >> In your case (--min-spare=1 --max-spare=1) you have N=1 which > >> turns a rare scenario into a common one. > > > > Sorry that's wrong. It should be when all of the busy children are > > locked-up and you get a consecutive run of "bad" messages that > > lockup-up all the remaining idle processes, you wont get any new > > idle events. > > > > The point's the same though if you have higher values of min-spare > > and max-spare, it's less likely to happen. > > I've adjusted one machine with min-spare and max-spare at 5, the > other min-spare 2 and max-spare 5; but I don't think that's it. > (Although even reducing the number of incidents will help...) > > Under normal processing, a burst of mail will show: > > child states: IBIBBIII > child states: BBIBBIII > child states: BBBBBIII > child states: BBBBBBII > child states: BBBBBBBI > child states: BBBBBBBB > child states: BBBBBBBBB > child states: BBBBBBBBBB > child states: BBBBBBBBBBB > ... > > etc, potentially up to max-children, within a few seconds.
I was describing the special case of how spamd could run out of children without hitting max-children. It could also run-out the normal way. And when there are clients waiting and a child becomes idle you do get a chain-reaction that adds children very rapidly. > During one of these lockups, it stalls whether or not there have been > free idle children (ie, potentially around that second or third line, > with 3 or 4 idle children). That's to be expected since the "child states" logging is generated by the same function that modifies the number of children. I think the important question here is whether you see high CPU usage when it locks-up with more "Bs" than you have cores. If you don't then it's not a problem regex. > None of that explains why the master spamd stops accepting new > connections, AFAICT. It depends what you mean by "accepting new connections". If you mean the TCP handshake then I've no idea, but I think anything higher needs idle children.