On 4/1/2020 4:52 AM, sten.kristian.ivars...@gmail.com wrote:
On 3/31/2020 5:10 PM, sten.kristian.ivars...@gmail.com wrote:
On 3/28/2020 10:19 PM, Ken Brown via Cygwin wrote:
On 3/28/2020 11:43 AM, Ken Brown via Cygwin wrote:
On 3/28/2020 8:10 AM, sten.kristian.ivars...@gmail.com wrote:
On 3/27/2020 10:53 AM, sten.kristian.ivars...@gmail.com wrote:
On 3/26/2020 7:19 PM, Ken Brown via Cygwin wrote:
On 3/26/2020 6:39 PM, Ken Brown via Cygwin wrote:
On 3/26/2020 6:01 PM, sten.kristian.ivars...@gmail.com wrote:
The ENIXIO occurs when parallel child-processes
simultaneously using O_NONBLOCK opening the descriptor.

This is consistent with my guess that the error is generated
by fhandler_fifo::wait.  I have a feeling that read_ready
should have been created as a manual-reset event, and that
more care is needed to make sure it's set when it should be.

I could provide a code-snippet to reproduce it if wanted ?

Yes, please!

That might not be necessary.  If you're able to build the git
repo master branch, please try the attached patch.

Here's a better patch.


I finally succeeded to build latest master (make is not my
favourite
tool) and added the patch, but still no success in my little
test-program (see
attachment) when creating a write-file-descriptor with
O_NONBLOCK

Your test program fails for me on Linux too.  Here's the output
from one
run:

You're right. That was extremely careless of me to not test this
in Linux first :-)

No problem.

I can assure that we have a use case that works on Linux but not
in Cygwin, but it seems like I failed to narrow it down in the
wrong way

I'll try to rearrange my code (that works in Linux) to mimic our
application but in a simple way (I'll be back)

OK, I'll be waiting for you.  BTW, if it's not too hard to write
your test case in plain C, or at least less modern C++, that would
simplify things for me.  For example, your pipe.cpp failed to
compile on one Linux machine I wanted to test it on, presumably
because that
machine had an older C++ compiler.

Never mind.  I was able to reproduce the problem and find the cause.
What happens is that when the first subprocess exits,
fhandler_fifo::close resets read_ready.  That causes the second and
subsequent subprocesses to think that there's no reader open, so
their attempts to open a writer with O_NONBLOCK fail with ENXIO.

I should be able to fix this tomorrow.

I've pushed what I think is a fix to the topic/fifo branch.  I tested
it
with the attached program, which is a variant of the test case you
sent last week.
Please test it in your use case.

Note: If you've previously pulled the topic/fifo branch, then you
will
probably get a lot of conflicts when you pull again, because I did a
forced push a few days ago.  If that happens, just do

    git reset --hard origin/topic/fifo

It turned out that the fix required some of the ideas that I've been
working on in connection with allowing multiple readers.  Even though
the code allows a FIFO to be *explicitly* opened for reading only
once, there can still be several open file descriptors for readers
because of dup and fork.  The existing code on git master doesn't
handle those situations properly.

The code on topic/fifo doesn't completely fix that yet, but I think
it
should work under the following assumptions:

1. The FIFO is opened only once for reading.

2. The file descriptor obtained from this is the only one on which a
read
is attempted.

I'm working on removing both of these restrictions.

Ken

We finally took the time to make some kind of a simplified "hack" that
works on Ubuntu and BSD/OSX but with latest on master newlib-cygwin gave
"ENXIO"
now and then but with your previous patch attached, there was no ENXIO
but ::read returns EAGIN (until exhausted) (with cygwin) almost every
run

I will try your newest things tomorrow

See latest attatched test-program (starts to get bloated but this time
more C-compatible though:-)

Thanks.  This runs fine with the current HEAD of topic/fifo.

I wrote in a previous mail in this topic that it seemed to work fine for me
as well, but when I bumped up the numbers of writers and/or the number of
messages (e.g. 25/25) it starts to fail again

The initial thought is that we're bumping into some kind of system resource
limit, but I haven't had the time to dig into details (yet) (I'm sorry for
that)

Yes, it is a resource issue. There is a limit on the number of writers that can be open at one time, currently 64. I chose that number arbitrarily, with no idea what might actually be needed in practice, and it can easily be changed.

In addition, a writer isn't recognized as closed until a reader tries to read and gets an error. In your example with 25/25, the list of writers quickly gets to 64 before the parent ever tries to read.

I'll see if I can find a better way to manage this.

Ken
--
Problem reports:      https://cygwin.com/problems.html
FAQ:                  https://cygwin.com/faq/
Documentation:        https://cygwin.com/docs.html
Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple

Reply via email to