John Horne writes: > On Wed, 2006-09-06 at 11:38 -0400, Theo Van Dinter wrote: > > > My understanding (I haven't really looked at that code) is that "K" means > > the > > child has been killed but it hasn't exited yet. If a child is in that state > > for more than, say, 5 seconds, there's likely an issue where it doesn't > > actually die off, imo. > > > > You should generally see states of I or B. > > > I get the feeling that something is wrong here. I have restarted SA, and > grepped the log file. It shows: > > ======================================================================= > prefork: child states: BI > prefork: child states: BB > prefork: child states: BBB > prefork: child states: BBBB > prefork: child states: BBBBS > prefork: child states: BBBBII > prefork: child states: IBBBII > prefork: child states: IIBBIK > prefork: child states: IIIBKK > prefork: child states: IIKIKK > prefork: child states: IBKKKK > prefork: child states: IIKKKK > prefork: child states: BBKKKK > prefork: child states: BBKKKKB > prefork: child states: BBKKKKBB > prefork: server reached --max-children setting, consider raising it > prefork: child states: BIKKKKBB > prefork: child states: IBKKKKBB > prefork: child states: IBKKKKIB > prefork: child states: IIKKKKIB > prefork: child states: BIKKKKKI > prefork: child states: IBKKKKKB > prefork: child states: BBKKKKKI > prefork: child states: BIKKKKKI > prefork: child states: IIKKKKKI > prefork: child states: IBKKKKKK > prefork: child states: IIKKKKKK > prefork: child states: BBKKKKKK > prefork: server reached --max-children setting, consider raising it > prefork: child states: BBKKKKKK > prefork: server reached --max-children setting, consider raising it > prefork: child states: IBKKKKKK > prefork: child states: BIKKKKKK > prefork: child states: IIKKKKKK > ======================================================================= > > Some of the processes seem to almost immediately go in to the 'killed' > state and stay there. 'ps auxww' shows that all 8 child processes are > started. Running an strace (this is a Fedora Core 4 server) on some of > the processes seems to show that they are waiting on select, and then > get a 'resources unavailable' error. What resource I have no idea. E.g: > > ======================================================================= > strace -Ff -p 12805 > Process 12805 attached - interrupt to quit > select(16, [10], NULL, NULL, {290, 888000}) = 1 (in [10], left {147, > 820000}) > read(10, "P....\n", 6) = 6 > read(10, 0xb4515f0, 6) = -1 EAGAIN (Resource > temporarily unavailable) > time(NULL) = 1157559274 > select(16, [10], NULL, NULL, {300, 0} > ======================================================================= > > The process just sits there in this loop of some sort, and never seems > to do any actual spam processing. > > Any ideas about this?
That looks bad :( The strace snippet, however, is pretty normal-looking. First off, are you using an up-to-date 3.1.x release? Secondly, you need to strace both the child *and* the parent spamd process -- the easiest way to do this is to "strace -f" the parent spamd, then kill -15 the kids so it starts new (traced) ones. --j.