https://github.com/labath commented:
[I'm sorry, this is going to be long] Most of David's comments are there because you copied the existing attach implementation. Even if we go down the path of different attach methods, I don't think it's necessary to copy all of that. Actual attaching is a small part of that function. The rest of the stuff, which deals with looping, gathering thread ids, and error messages, could be handled by common code. I say *if* because it's not clear to me that we've given up on using the common code path we discussed on the rfc thread. I think it could look something like this: ``` ptrace(PTRACE_SEIZE, pid, 0, (void*)(PTRACE_O_TRACEEXIT)); syscall(SYS_tgkill, pid, pid, SIGSTOP); ptrace(PTRACE_INTERRUPT, pid, 0, 0); int status; waitpid(pid, &status, __WALL); if (status>>8 == (SIGTRAP | PTRACE_EVENT_STOP << 8)) { ptrace(PTRACE_CONT, pid, 0, 0); waitpid(pid, &status, __WALL); } ``` If you run this on a "regular" process you get this sequence of events: ``` ptrace(PTRACE_SEIZE, 31169, NULL, PTRACE_O_TRACEEXIT) = 0 tgkill(31169, 31169, SIGSTOP) = 0 ptrace(PTRACE_INTERRUPT, 31169) = 0 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_TRAPPED, si_pid=31169, si_uid=1000, si_status=SIGSTOP, si_utime=0, si_stime=0} --- wait4(31169, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGSTOP}], __WALL, NULL) = 31169 ``` .. or this one: ``` ptrace(PTRACE_SEIZE, 31169, NULL, PTRACE_O_TRACEEXIT) = 0 tgkill(31169, 31169, SIGSTOP) = 0 ptrace(PTRACE_INTERRUPT, 31169) = 0 wait4(31169, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGTRAP}|PTRACE_EVENT_STOP<<16], __WALL, NULL) = 31169 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_STOPPED, si_pid=31169, si_uid=1000, si_status=0, si_utime=0, si_stime=0} --- ptrace(PTRACE_CONT, 31169, NULL, 0) = 0 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_TRAPPED, si_pid=31169, si_uid=1000, si_status=SIGSTOP, si_utime=0, si_stime=0} --- wait4(31169, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGSTOP}], __WALL, NULL) = 31169 ``` The difference is in whether the kernel is fast enough to notice the SIGSTOP before it PTRACE_INTERRUPTS the process (the fact that this results in nondeterministic behavior might be worth filing a bug report for the kernel). With this simple code, I usually get the second version, but this is really a matter of microseconds. Even inserting `usleep(0)` between the SIGSTOP and PTRACE_INTERRUPT calls is enough to make the process immediately stop with SIGSTOP. However, that doesn't matter. After this code is done, the process is stopped with a SIGSTOP, just like it was in the PTRACE_ATTACH case. For a "dead" process, the sequence slightly different: ``` ptrace(PTRACE_SEIZE, 31169, NULL, PTRACE_O_TRACEEXIT) = 0 tgkill(31169, 31169, SIGSTOP) = 0 ptrace(PTRACE_INTERRUPT, 31169) = 0 wait4(31169, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGTRAP}|PTRACE_EVENT_EXIT<<16], __WALL, NULL) = 31169 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_TRAPPED, si_pid=31169, si_uid=1000, si_status=SIGTRAP, si_utime=0, si_stime=0} --- ``` In this case you always get a PTRACE_EVENT_EXIT stop. The nicest part about this is that could be used to distinguish between the two process states -- without the core dumping flag, the proc filesystem, or anything else. I think it's theoretically possible to catch a live thread while it's exiting and get a PTRACE_EVENT_EXIT, but getting that for *all* threads in the process is very unlikely. Or, instead of making this a process wide-property, we could make this a property of a specific thread. We could say that a *thread* that is in an "exiting" state is not allowed to run expressions. And this dead process would just be a special case of a process which happens to have all threads in the exiting state. And the nicest part about *that* is that this is extremely close to being testable. We no longer need a core-dumping process -- we just need a process that's stopped in a PTRACE_EVENT_EXIT. Right now, we don't have a way to do that, but we're really close to that. One way to do that would be to let the user request stopping a thread at exit. If we implement all of the rest, this will be mainly a UX question of how we want to expose this to the user. I think we could make some sort of an experimental setting for that, or even just some secret packet that we send with the `process plugin packet` command. What do you say to all this? I realize I am sort of asking you my pet feature, but I think that by framing your feature (debugging of dead processes) as a special case of this (debugging exiting processes), we can avoid most of the problematic issues with this PR (divergence of implementation, impossibility of testing). https://github.com/llvm/llvm-project/pull/137041 _______________________________________________ lldb-commits mailing list lldb-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits