[Lldb-commits] [lldb] [LLDB] Ptrace seize dead processes on Linux (PR #137041)

Pavel Labath via lldb-commits Wed, 21 May 2025 08:16:33 -0700

https://github.com/labath commented:


[I'm sorry, this is going to be long]

Most of David's comments are there because you copied the existing attach 
implementation. Even if we go down the path of different attach methods, I 
don't think it's necessary to copy all of that. Actual attaching is a small 
part of that function. The rest of the stuff, which deals with looping, 
gathering thread ids, and error messages, could be handled by common code.

I say *if* because it's not clear to me that we've given up on using the common 
code path we discussed on the rfc thread. I think it could look something like 
this:
```
  ptrace(PTRACE_SEIZE, pid, 0, (void*)(PTRACE_O_TRACEEXIT));
  syscall(SYS_tgkill, pid, pid, SIGSTOP);
  ptrace(PTRACE_INTERRUPT, pid, 0, 0);
  int status;
  waitpid(pid, &status, __WALL);
  if (status>>8 == (SIGTRAP | PTRACE_EVENT_STOP << 8)) {
    ptrace(PTRACE_CONT, pid, 0, 0);
    waitpid(pid, &status, __WALL);
  }
```

If you run this on a "regular" process you get this sequence of events:

```
ptrace(PTRACE_SEIZE, 31169, NULL, PTRACE_O_TRACEEXIT) = 0
tgkill(31169, 31169, SIGSTOP)           = 0
ptrace(PTRACE_INTERRUPT, 31169)         = 0
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_TRAPPED, si_pid=31169, si_uid=1000, 
si_status=SIGSTOP, si_utime=0, si_stime=0} ---
wait4(31169, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGSTOP}], __WALL, NULL) = 31169
```
.. or this one:
```
ptrace(PTRACE_SEIZE, 31169, NULL, PTRACE_O_TRACEEXIT) = 0
tgkill(31169, 31169, SIGSTOP)           = 0
ptrace(PTRACE_INTERRUPT, 31169)         = 0
wait4(31169, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGTRAP}|PTRACE_EVENT_STOP<<16], 
__WALL, NULL) = 31169
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_STOPPED, si_pid=31169, si_uid=1000, 
si_status=0, si_utime=0, si_stime=0} ---
ptrace(PTRACE_CONT, 31169, NULL, 0)     = 0
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_TRAPPED, si_pid=31169, si_uid=1000, 
si_status=SIGSTOP, si_utime=0, si_stime=0} ---
wait4(31169, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGSTOP}], __WALL, NULL) = 31169
```

The difference is in whether the kernel is fast enough to notice the SIGSTOP 
before it PTRACE_INTERRUPTS the process (the fact that this results in 
nondeterministic behavior might be worth filing a bug report for the kernel). 
With this simple code, I usually get the second version, but this is really a 
matter of microseconds. Even inserting `usleep(0)` between the SIGSTOP and 
PTRACE_INTERRUPT calls is enough to make the process immediately stop with 
SIGSTOP.

However, that doesn't matter. After this code is done, the process is stopped 
with a SIGSTOP, just like it was in the PTRACE_ATTACH case. For a "dead" 
process, the sequence slightly different:
```
ptrace(PTRACE_SEIZE, 31169, NULL, PTRACE_O_TRACEEXIT) = 0
tgkill(31169, 31169, SIGSTOP)           = 0
ptrace(PTRACE_INTERRUPT, 31169)         = 0
wait4(31169, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGTRAP}|PTRACE_EVENT_EXIT<<16], 
__WALL, NULL) = 31169
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_TRAPPED, si_pid=31169, si_uid=1000, 
si_status=SIGTRAP, si_utime=0, si_stime=0} ---
```
In this case you always get a PTRACE_EVENT_EXIT stop.

The nicest part about this is that could be used to distinguish between the two 
process states -- without the core dumping flag, the proc filesystem, or 
anything else. I think it's theoretically possible to catch a live thread while 
it's exiting and get a PTRACE_EVENT_EXIT, but getting that for *all* threads in 
the process is very unlikely.

Or, instead of making this a process wide-property, we could make this a 
property of a specific thread. We could say that a *thread* that is in an 
"exiting" state is not allowed to run expressions. And this dead process would 
just be a special case of a process which happens to have all threads in the 
exiting state.

And the nicest part about *that* is that this is extremely close to being 
testable. We no longer need a core-dumping process -- we just need a process 
that's stopped in a PTRACE_EVENT_EXIT. Right now, we don't have a way to do 
that, but we're really close to that. One way to do that would be to let the 
user request stopping a thread at exit. If we implement all of the rest, this 
will be mainly a UX question of how we want to expose this to the user. I think 
we could make some sort of an experimental setting for that, or even just some 
secret packet that we send with the `process plugin packet` command.

What do you say to all this? I realize I am sort of asking you my pet feature, 
but I think that by framing your feature (debugging of dead processes) as a 
special case of this (debugging exiting processes), we can avoid most of the 
problematic issues with this PR (divergence of implementation, impossibility of 
testing).

https://github.com/llvm/llvm-project/pull/137041
_______________________________________________
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits

[Lldb-commits] [lldb] [LLDB] Ptrace seize dead processes on Linux (PR #137041)

Reply via email to