On 02/15/2012 09:40 AM, Konstantin Belousov wrote:
On Wed, Feb 15, 2012 at 09:22:10AM -0800, Dmitry Mikulin wrote:
On 02/15/2012 08:32 AM, Konstantin Belousov wrote:
On Mon, Feb 13, 2012 at 02:50:45PM -0800, Dmitry Mikulin wrote:
It seems that now wait4(2) can be called from the real (non-debugger)
parent first and result in the call to proc_reap(), isn't it ? We would
then just reparent the child back to the caller, still leaving the
zombie and confusing debugger.
When either gdb or the real parent gets to proc_reap() the process
wouldn't
get destroyed, it'll get caught by the following clause:
if (p->p_oppid&& (t = pfind(p->p_oppid)) != NULL) {
and the real parent with get the child back into the children's list
while
gdb will get it into the orphan list. The second time around when
proc_reap() is entered, p->p_oppid will be 0 and the process will get
really reaped. Does it make sense? And proc_reparent() attempts to keep
the
orphan list clean and not have the same entries and the list of
siblings.
Right, this is what I figured. But I asked about some further implication
of this change:
if real parent spuriosly calls wait4(2) on the child pid after the child
exited, but before the debugger called the wait4(), then exactly the
code you noted above will be run. This results in the child being fully
returned to the original parent.
Next, the wait4() call from debugger gets an error, and zombie will be
kept around until parent calls wait4() for this pid once more.
Am I missed something ?
In this case the process will move from gdb's child list to gdb's orphan
list when the real parent does a wait4(). Next time around the wait loop
in
gdb it'll be caught by the orphan's proc_reap().
I do not see how the next debugger loop could find this process at all,
since the first wait4() call reparented it to the original parent.
Not the debugger loop, the kern_wait() loop. The child get re-parented to
the original parent but moves to the orphan list of the debugger process.
Either the debugger loop which calls wait4/waitpid, or the kern_wait loop
resulting from the debugger calling wait*.
Could you, please, describe, how the patched kernel moves the wait'ed
zombie to the orphan list of the debugger ?
For me, it seems that there is another bug, the child appears both on
the childdren list, and on the orphan list of the real parent.
The first attempt to reap the child will get into the
if (p->p_oppid && (t = pfind(p->p_oppid)) != NULL) {
clause, which will re-parent it to the real parent. The child will not be
destroyed at this point.
The following loop in proc_reparent() will make sure that the child does not
stay in both lists:
LIST_FOREACH(p, &parent->p_orphans, p_orphan) {
if (p == child) {
LIST_REMOVE(child, p_orphan);
break;
}
}
Since the child parent is gdb and it's still being traced, the following will
move it to gdb's orphan list:
if (child->p_flag & P_TRACED)
LIST_INSERT_HEAD(&child->p_pptr->p_orphans, child, p_orphan);
After this the real parent will get the exit status.
The next pass through the kern_wait() loop called from gdb will catch the child in
its orphan list and will reap it this time for real since p->p_oppid will be
set to 0 in the previous attempt to reap it. Gdb gets the exit code, the child is
destroyed.
_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"