On Sat, 18 Jun 2022, Jacob Moody wrote:
I've attempted to reproduce it, trying to remove the libthread/notify
factors. I've come up with this:

#include <u.h>
#include <libc.h>

static void
proc_udp(void*)
{
       char resp[512];
       char req[] = "request";
       int fd;
       int n;
       int pid;

       fd = dial("udp!185.157.221.201!5678", nil, nil, nil);
       if(fd < 0)
               exits("can't dial");

       if(write(fd, req, strlen(req)) != strlen(req))
               exits("can't write");

       pid = getpid();
       fprint(1, "start %d\n", pid);
       n = read(fd, resp, sizeof(resp)-1);
       fprint(1, "end %d %d\n", pid, n);
       exits(nil);
}

void
main(int, char**)
{
       int i;
       Waitmsg *wm;

       for(i = 0; i < 10; i++){
               switch(fork()){
               case -1:
                       sysfatal("fork %r");
               case 0:
                       proc_udp(nil);
                       sysfatal("ret");
               default:
                       break;
               }
       }
       for(i = 0; i < 10; i++){
               wm = wait();
               print("proc %d died with message %s\n", wm->pid, wm->msg);
       }
       exits(nil);
}

This code makes it pretty obvious that we are losing some children;
on my machine this program never exits. I see some portion of the
readers correctly returning -1, and the parent is able to get their
Waitmsg but not all of them.

Moody I think this old thread will interest you:

https://marc.info/?t=112730920400001&r=1&w=2

Russ Cox explained there:
 It appears that your program, at its core, it is doing this:

 void
 readproc(void *v)
 {
     int fd;
     char buf[100];
     fd = (int)v;
     read(fd, buf, sizeof buf);
 }

 void
 threadmain(int argc, char **argv)
 {
     int p[2];
     pipe(p);
     proccreate(readproc, (void*)p[0], 8192);
     proccreate(readproc, (void*)p[1], 8192);
     close(p[0]);
     /* and here you expect the first readproc to be done */
     close(p[1]);
     /* and here the second */
 }

 Each read call is holding up a reference to its channel
 inside the kernel, so that even though you've closed the fd
 and removed the ref from the fd table, there is still a reference
 to each side of the pipe in the form of the process blocked
 on the read.

 I've never been sure whether the implicit ref held during
 the system call is good behavior, but it's hard to change.

 In your case, writing 0 (or anything) makes the read
 finish, releasing the last ref to the underlying pipe when
 the system call finishes, and then everything cleans up
 as expected.  So you've found your workaround, and now
 we understand why it works.

------------------------------------------
9fans: 9fans
Permalink: 
https://9fans.topicbox.com/groups/9fans/Tfa6823048ad90a21-M6e48031f9e8673387c0b47b8
Delivery options: https://9fans.topicbox.com/groups/9fans/subscription

Reply via email to