On 10/4/2019 10:27 AM, Ken Brown wrote: > On 9/29/2019 4:05 PM, Ken Brown wrote: >> On 9/27/2019 10:12 AM, Ken Brown wrote: >>> On 9/27/2019 9:37 AM, Norton Allen wrote: >>>> On 9/26/2019 10:50 PM, Ken Brown wrote: >>>>> >>>>>> As a simple test example, consider: >>>>>> >>>>>> /bin/ssh-agent /bin/sleep 10 >>>>>> >>>>>> While the sleep is still running, ps shows: >>>>>> >>>>>> PID PPID PGID WINPID TTY UID STIME >>>>>> COMMAND >>>>>> 1694 1693 1694 1576 ? 22534 00:01:10 >>>>>> /usr/bin/ssh-agent >>>>>> 1653 1 1653 11740 cons1 22534 00:00:37 >>>>>> /usr/bin/bash >>>>>> 1693 1653 1693 1552 cons1 22534 00:01:10 >>>>>> /usr/bin/sleep >>>>>> >>>>>> One oddity is that ssh-agent is listed as a subprocess of sleep >>>>> ...but this isn't a bug. ssh-agent forks, and then the parent execs the >>>>> command. >>>> >>>> With the salient difference presumably being that the exec is done in the >>>> parent >>>> instead of the child as usual? >>> >>> Yes. The idea is that 'ssh-agent command' should be more-or-less >>> equivalent to >>> running 'command', with ssh-agent running as a subprocess. >>> >>> The ssh-agent subprocess periodically checks to see if its parent is still >>> alive, and it exits when the parent has died. Someone should figure out why >>> this is not working on Cygwin. >> >> As an aid to someone who might want to debug this (probably Corinna when she >> returns), I've created a test program agent.c (attached) that simulates the >> relevant part of ssh-agent: >> >> 1. It forks a subprocess that periodically checks to see if its parent has >> died, >> and then exits. >> >> 2. The parent execs "/usr/bin/sleep 1". >> >> As with ssh-agent, the subprocess never detects that the parent has died, >> and so >> it never exits. >> >> Running this program under strace shows the following error in the pinfo >> constructor: >> >> pinfo::pinfo: couldn't duplicate parent rd_proc_pipe handle 0x1BC for forked >> child 1666 after exec, Win32 error 5 >> >> [Win32 error 5 is ERROR_ACCESS_DENIED.] > > It seems that the pinfo constructor failure happens in > cygheap_exec_info::reattach_children(). The latter is preceded by the > following > comment: > > /* Reattach non-reaped subprocesses passed in from the cygwin process > which previously operated under this pid. FIXME: Is there a race here > if the process exits during cygwin's exec handoff? */ > > I tried running my test program under gdb with a breakpoint at > reattach_children, and the breakpoint was never hit. That gives an > affirmative > answer to the question in the FIXME. > > As a result, the exec'd program never becomes aware that it has a subprocess, > so > it exits without resetting the subprocess's ppid to 1. > > Is there someone out there familiar enough with Cygwin's exec to suggest a > fix? > It would be a nice gift to Corinna to get this fixed before her return.
What I said above about gdb is nonsense. It's the exec'd process that calls reattach_children, so I wouldn't expect gdb to see that call. I think the rest of my analysis is correct, but I'm not sure that the FIXME explains the failure. Ken -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple