On Wed, Jan 29, 2020 at 02:04:06PM +0100, Martin Pieuchot wrote:
> Diff below enables a ptrace(2) regress coming from NetBSD.
>
> With usr.bin/make built since -D2020-01-14, that includes -current, it
> complains during the last test:
>
> make: Child (52049) not in table?
> FAILED
>
> That results in a failing test, however the syscall correctly reports
> EBUSY.
>
> Should I commit this first to help you look at the issue?
At first I thought forgetting to handle WIFSTOPPED might explain things.
But looking more closely, I think the changes in make just made a system
bug more apparent.
By instrumenting make a bit:
Index: job.c
===================================================================
RCS file: /cvs/src/usr.bin/make/job.c,v
retrieving revision 1.159
diff -u -p -r1.159 job.c
--- job.c 16 Jan 2020 16:07:18 -0000 1.159
+++ job.c 29 Jan 2020 13:52:41 -0000
@@ -757,11 +757,15 @@ reap_jobs(void)
Job *job;
while ((pid = waitpid(WAIT_ANY, &status, WNOHANG)) > 0) {
+ fprintf(stderr, "Process %ld said %d\n", (long)pid, status);
+ if (WIFSTOPPED(status) || WIFCONTINUED(status))
+ continue;
reaped = true;
job = reap_finished_job(pid);
if (job == NULL) {
- Punt("Child (%ld) not in table?", (long)pid);
+ Punt("Child (%ld) with status %d not in table?",
+ (long)pid, status);
} else {
handle_job_status(job, status);
determine_job_next_step(job);
I see the following pattern:
./t_ptrace -r 6
Mark the parent process (PID 22772) a debugger of PID 93154
Mark the parent process (PID 22772) a debugger of PID 93154 again
Process 93154 said 0
Process 93154 said 0
make: Child (93154) with status 0 not in table?
so waitpid gives me 93154 with status 0 *twice* (so it reaps the same child
twice, as status == 0 corresponds to exit(0) ).
I fail to see how I can recover from that (or why I should)...