Hi, I recently wrote a program that was using ptrace() to suspend a process, and resume it later. Maybe ptrace just isn't used enough, or maybe I just don't get all the reasons behind why it's implemented the way it is, but it seems to be somewhat buggy.
(I'm bringing this up on -hackers rather than filing a PR, because i'm not 100% sure that what I'm seeing is indeed a bug.) Firstly, I noticed that PT_DETACH() wasn't working as expected: The signal was effectively being ignored, and a combination of ptrace(PT_ATTACH, pid, 0, 0) followed by ptrace(PT_DETACH, pid, (caddr_t)1, 0) would result in the target process getting stopped. The only way around this was to kill(pid, SIGCONT) before the PT_DETACH. This seems to be less than satisfactory: I would have thought the ptrace() call should be capable of at least attaching and detaching from the process without causing such spurious signals to be delivered to its victim. This apparent mis-handling of this signal argument to ptrace() appears to occur in issignal(): (Im discussing line numbers from kern_sig.c, 1.72.2.14) On ptrace(PT_ATTACH, ...), the target process gets to the stop() on line 1254, with p->p_xstat == SIGSTOP. On ptrace(PT_DETACH, pid, (caddr_t)1, mySig), the parent kicks the stopped child, after clearing the P_TRACE flag. The child wakes up, with the new signal, mySig, set in p->p_xstat, and the old signal still present in p->p_siglist from the "psignal()" call on line 1252. It then notices that the P_TRACED flag has been switched off, and starts all over again, discarding the new signal, and re-finding the original SSTOP that was sent by the ptrace(PT_ATTACH) call. ie, the process gets sent the signal that the ptrace() call explicitly tried to replace. So, is my patch correct, or would one for the ptrace manpage be a better approach? Secondly, the entire mechanism for delivering these signals with ptrace() seems to be somewhat unreliable. The general idea is that issignal() is immediately followed by a call to postsig(), but that is not always what happens (eg, tsleep() called with PCATCH) If we end up with multiple calls to issignal() for one call to postsig(), the debugger can sometimes see signal it tries to continue a process with arrive after a wait(), and it has no idea wheather this signal arrived because it was raised by the child or an external process, or from its own call to ptrace(). I haven't investigated further, but I'd have thought that postsig() was a better place to do the stop/resume and signal replacement for a traced child. I know this would "hide" ignored signals, etc, from the child, but in that event, I don't think a user-space debugger would really care: The signal would never arrive at the child under normal circumstances anyway, and ptrace()/wait() combinations would show much more deterministic behaviour. Is it worthwhile trying to "fix" this, or is there an obvious (to someone else) stumbling block I'll fall over in attempting it? -- Peter.
Index: kern_sig.c =================================================================== RCS file: /pub/FreeBSD/development/FreeBSD-CVS/src/sys/kern/kern_sig.c,v retrieving revision 1.72.2.14 diff -u -r1.72.2.14 kern_sig.c --- kern_sig.c 14 Dec 2001 03:05:32 -0000 1.72.2.14 +++ kern_sig.c 12 Feb 2002 09:22:52 -0000 @@ -1257,14 +1257,6 @@ && p->p_flag & P_TRACED); /* - * If the traced bit got turned off, go back up - * to the top to rescan signals. This ensures - * that p_sig* and ps_sigact are consistent. - */ - if ((p->p_flag & P_TRACED) == 0) - continue; - - /* * If parent wants us to take the signal, * then it will leave it in p->p_xstat; * otherwise we just look for signals again. @@ -1275,10 +1267,21 @@ continue; /* - * Put the new signal into p_siglist. If the - * signal is being masked, look for other signals. + * Put the new signal into p_siglist. */ SIGADDSET(p->p_siglist, sig); + + /* + * If the traced bit got turned off, go back up + * to the top to rescan signals. This ensures + * that p_sig* and ps_sigact are consistent. + */ + if ((p->p_flag & P_TRACED) == 0) + continue; + /* + * If the signal is being masked, look for other + * signals. + */ if (SIGISMEMBER(p->p_sigmask, sig)) continue; }