Hi,
I've been hunting an issue for some days now, where a non-cygwin program
using microsoft's UCRT sometimes end up with a sticky error on stdout
when running under cygwin perl with a pipe capturing stdout and stderr.
When the problem triggers, the pipe buffer appears to be full and it
really looks like it's hitting the errno=ENOSPC/doserrno=0 situation at
the tail end of _write_nolock() in ucrt/lowio/write.cpp.
I *think* the issue is that the write end of the pipe isn't configured
to be synchronous. In winsup/cygwin/fhandler/pipe.cc, the nt_create()
function sets FILE_SYNCHRONOUS_IO_NONALERT when creating the _read_ end
of the pipe using NtCreateNamedPipeFile, citing some C# program
compatibility need. But, the call to NtOpenFile below that opens the
_write_ end of the pipe doesn't set it. It does set the SYNCHRONIZE
access right, but doesn't set the FILE_SYNCHRONOUS_IO_NONALERT flag
(last parameter, is zero). This is akin to calling CreateFile with
FILE_FLAG_OVERLAPPED, if I understand it correctly.
This lack of symmetry in the synchronization configuration between the
two pipe ends looks like a bug to me, given the comment on read end
saying "Set FILE_SYNCHRONOUS_IO_NONALERT flag so that native C# programs
work with cygwin pipe".
I have found no way to enable the FILE_SYNCHRONOUS_IO_NONALERT behaviour
on my end of the pipe after it has been created, unlike the ReadMode and
CompletionMode (parameters 9 & 10 of the NtCreateNamedPipeFile call).
The non-cygwin program in question has had issues with the
FILE_PIPE_COMPLETE_OPERATION (PIPE_NOWAIT) and cygwin in the past, but
that was solvable in a manner similar to calling
fhandler_pipe::set_pipe_non_blocking(true). I can't come up with a good
workaround for this problem, short of switching to a non-cygwin perl
setup (painful).
The problem shows up sporadically (a few times almost every day lately)
on an up-to-date windows server 2022 with a slightly dated cygwin setup
(I doubt that matters much as the code in question seems to be unchanged
in git head). It's a continuous build job that triggers this, and I'm
not too keen on building my own cygwin dlls (haven't done that for at
least 15 years) and placing them on this server. I've tried to
reproduce it locally on a similarly beefy Windows 11 workstation, but
haven't had any luck at all, so either windows server 2022 specific
kernel regression or just your regular heisenbug.
The kernel code will do very different serialization for the pipe object
when neither of the FILE_SYNCHRONOUS_IO_ALERT/NONALERT the flags are
set, from what I can tell, though I haven't dug too deep into things
yet. My guess, though, is it that if multiple threads/processes writes
to an almost full pipe buffer at the same time, (Nt)WriteFile may
sometimes return before writing the whole/any buffer, and thus upset
the stupid UCRT code. Just forcing the pipe buffer to run full doesn't
trigger it by it self, from what I can tell.
Also, in the error path of the NtOpenFile call, GetLastError() is used
instead of __seterrno_from_nt_status() or RtlNtStatusToDosError().
Kind Regards,
bird.
--
Problem reports: https://cygwin.com/problems.html
FAQ: https://cygwin.com/faq/
Documentation: https://cygwin.com/docs.html
Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple