On Tue, Apr 19, 2005 at 11:07:01AM +1000, Benjamin Herrenschmidt was heard to remark: > On Mon, 2005-04-18 at 14:38 -0500, Linas Vepstas wrote: > > > > Hi, > > > > The patch below appears to fix a problem where a number of dead processes > > linger on the system. On a highly loaded system, dozens of processes > > were found stuck in do_exit(), calling thier very last schedule(), and > > then being lost forever. > > > > Processes that are PF_DEAD are cleaned up *after* the context switch, > > in a routine called finish_task_switch(task_t *prev). The "prev" gets > > the value returned by _switch() in entry.S, but this value comes from > > > > __switch_to (struct task_struct *prev, > > struct task_struct *new) > > { > > old_thread = ¤t->thread; ///XXX shouldn't this be prev, not > > current? > > last = _switch(old_thread, new_thread); > > return last; > > } > > > > The way I see it, "prev" and "current" are almost always going to be > > pointing at the same thing; however, if a "need resched" happens, > > or there's a pre-emept or some-such, then prev and current won't be > > the same; in which case, finish_task_switch() will end up cleaning > > up the old current, instead of prev. This will result in dead processes > > hanging around, which will never be scheduled again, and will never > > get a chance to have put_task_struct() called on them. > > Ok, thinking moer about this ... that will need maybe some help from > Ingo so I fully understand where schedule's are allowed ... We are > basically in the middle of the scheduler here, so I wonder how much of > the scheduler itself can be preempted or so ... > > Basically, under which circumstances can prev and current be different ?
I remember finding a path through void __sched schedule(void) that took a branch through the goto need_resched; that would result in this. I takes a bit of mental gymnastics to see how this might happen. FWIW, I can send you a debug session showing all cpu's idle and 44 dead processes sitting in do_exit(). All but two of these were Java threads, so this seems to be some sort of thread-scheduling subtlty. (the two that weren't java threads were a find|grep pair that must have gotten tangled in.) Given that the patch seems to fix the problem, I didn't dig much deeper. --linas - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/