> On 4/1/2020 2:34 PM, Ken Brown via Cygwin wrote: > > On 4/1/2020 1:14 PM, sten.kristian.ivars...@gmail.com wrote: > >>> On 4/1/2020 4:52 AM, sten.kristian.ivars...@gmail.com wrote: > >>>>> On 3/31/2020 5:10 PM, sten.kristian.ivars...@gmail.com wrote: > >>>>>>> On 3/28/2020 10:19 PM, Ken Brown via Cygwin wrote: > >>>>>>>> On 3/28/2020 11:43 AM, Ken Brown via Cygwin wrote: > >>>>>>>>> On 3/28/2020 8:10 AM, sten.kristian.ivars...@gmail.com wrote: > >>>>>>>>>>> On 3/27/2020 10:53 AM, sten.kristian.ivars...@gmail.com wrote: > >>>>>>>>>>>>> On 3/26/2020 7:19 PM, Ken Brown via Cygwin wrote: > >>>>>>>>>>>>>> On 3/26/2020 6:39 PM, Ken Brown via Cygwin wrote: > >>>>>>>>>>>>>>> On 3/26/2020 6:01 PM, sten.kristian.ivars...@gmail.com wrote: > >>>>>>>>>>>>>>>> The ENIXIO occurs when parallel child-processes > >>>>>>>>>>>>>>>> simultaneously using O_NONBLOCK opening the descriptor. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> This is consistent with my guess that the error is > >>>>>>>>>>>>>>> generated by fhandler_fifo::wait. I have a feeling that > >>>>>>>>>>>>>>> read_ready should have been created as a manual-reset > >>>>>>>>>>>>>>> event, and that more care is needed to make sure it's > >>>>>>>>>>>>>>> set > >> when it should be. > >> > >> [snip] > >> > >>>>>>>> Never mind. I was able to reproduce the problem and find the cause. > >>>>>>>> What happens is that when the first subprocess exits, > >>>>>>>> fhandler_fifo::close resets read_ready. That causes the second > >>>>>>>> and subsequent subprocesses to think that there's no reader > >>>>>>>> open, so their attempts to open a writer with O_NONBLOCK fail with ENXIO. > >> > >> [snip] > >> > >>>> I wrote in a previous mail in this topic that it seemed to work > >>>> fine for me as well, but when I bumped up the numbers of writers > >>>> and/or the number of messages (e.g. 25/25) it starts to fail again > >> > >> [snip] > >> > >>> Yes, it is a resource issue. There is a limit on the number of > >>> writers > >> that can be open at one > >>> time, currently 64. I chose that number arbitrarily, with no idea > >>> what > >> might actually be > >>> needed in practice, and it can easily be changed. > >> > >> Does it have to be a limit at all ? We would rather see that the > >> application decide how much resources it would like to use. In our > >> particular case there will be a process-manager with an incoming pipe > >> that possible several thousands of processes will write to > > > > I agree. > > > >> Just for fiddling around (to figure out if this is the limit that > >> make other things work a bit odd), where's this 64 limit defined now ? > > > > It's MAX_CLIENTS, defined in fhandler.h. But there seem to be other > > resource issues also; simply increasing MAX_CLIENTS doesn't solve the > > problem. I think there are also problems with the number of threads, > > for example. Each time your program forks, the subprocess inherits > > the rfd file descriptor and its "fifo_reader_thread" starts up. This > > is unnecessary for your application, so I tried disabling it (in > fhandler_fifo::fixup_after_fork), just as an experiment. > > > > But then I ran into some deadlocks, suggesting that one of the locks > > I'm using isn't robust enough. So I've got a lot of things to work on. > > > >>> In addition, a writer isn't recognized as closed until a reader > >>> tries to > >> read and gets an error. > >>> In your example with 25/25, the list of writers quickly gets to 64 > >>> before > >> the parent ever tries > >>> to read. > >> > >> That explains the behaviour, but should there be some error returned > >> from open/write (maybe it is but I'm missing it) ? > > > > The error is discovered in add_client_handler, called from > > thread_func. I think you'll only see it if you run the program under > > strace. I'll see if I can find a way to report it. Currently, > > there's a retry loop in fhandler_fifo::open when a writer tries to > > open, and I think I need to limit the number of retries and then error out. > > I pushed a few improvements and bug fixes, and your 25/25 example now runs without a > problem. I increased MAX_CLIENTS to 1024 just for the sake of this example, but I'll > work on letting the number of writers increase dynamically as needed.
I pulled it and tried it out and yes, the sample test program with 25/25 worked well and a whole bunch of our unit-tests passed with ok result now We still do have some issues, but I cannot yet tell if they are related to named pipes or not It is great that you're looking into a totally dynamic solution Kristian > Ken -- Problem reports: https://cygwin.com/problems.html FAQ: https://cygwin.com/faq/ Documentation: https://cygwin.com/docs.html Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple