On Jul 1 16:02, theodor.kaz...@gmx.de wrote: > Hi all, > > The issue was observed in cygwin versions 1.7.30-1, 2.0.2-1 and > 2.0.4-1 on Win7. Upon receiving 'HUGEFILE' form a debian system (scp > user@server:logDownload_m/TEST.txt .), scp in cygwin hangs from time > to time... > > Based on the observed occurrence behaviour of the issue and my > experience in hunting ghosts, I - based on the new aspect described > below - without much doubt assume a race condition between the cygwin > read()/write() function implementations. I started investigating the > cygwin-newlib implementation and gdb-ing it but I'm not yet that much > into the details of read()/write()-implementation to identify possible > race conditions. > So any contribution / co-working offers / investigation proposals are > highly appreciated as I can only progress slowly. > > > As a starting point I'd have, based on: "scp and it's child process > ssh share a pipe to receive the data of the downloaded file..." the > following Q: > 1) Is a child forked into a new, separate Win-process or is the fork > simulated in cygwin, e.g. by threads?
Fork is starting a new process. > 2) Is the pipe implementation in cygwin in such a setup using > Win-methods to forward the data or is the pipe-functionality simulated > in cygwin? Pipes are using Windows named pipes w/ overlapped IO. > 3) Can the action of providing data for reading from a pipe by the > consumer process (i.e. indicating data availability and the actual > copying of the data in a read()-call) be interrupted/impacted by a > write()-call from a pipe-data-producing process? I don't understand the question. Writing to the pipe on one end will trigger an event handle on the other end and the read call will check if overlapped data is available in a call to GetOverlappedResult. Writing to the pipe will not generate POSIX signals in the consumer process, if that's what you mean. > ---8<--- > size_t > atomicio6(ssize_t (*f) (int, void *, size_t), int fd, void *_s, size_t n, > int (*cb)(void *, size_t), void *cb_arg) > { > [...] > --->8--- A diff -up against the orignal code would be helpful for stuff like that. > $ cat -n scp_20150701_123824.txt > . . . > 50782 TK: atomicio.c: atomicio6: fd=8, event=1, res=49152, errno=011, > s[0]=B014565-N000001-, s[Q1-16]=B014565-N001024-B014566-N000001-, > s[Q2-16]=B014566-N001024-B014567-N000001-, > s[Q3-16]=B014567-N001024-B014568-N000001-, s[Q4-16]=B014568-N001024- > 50783 TK: scp.c: sink: amt=65536, j=65536, i=238616576, size=1553121293, > count=65536, buf=B014565-N000001- > 50784 TK: dispatch.c: ssh_dispatch_run: type=94 > 50785 TK: channels.c: channel_input_data: data_len=16384, > buf=B014569-N000001- > 50786 TK: channels.c: channel_handle_wfd(2): write: len=16384, > buf=B014569-N000001- > 50787 TK: atomicio.c: atomicio6: fd=8, event=1, res=-0001, errno=011, > s[0]=B014569-N000001-, s[Q1-16]=B014569-N001024-B014566-N000001-, > s[Q2-16]=, s[Q3-16]=, s[Q4-16]= Hmm. I inspected the read code for overlapped reads again and I don't see anything wrong. The question is why an EAGAIN is triggered. The only reason I can think of is if GetOverlappedResult returns FALSE when there actually is data in the buffer. For the underlying functionality, see https://sourceware.org/git/?p=newlib-cygwin.git;a=blob;f=winsup/cygwin/fhandler.cc;h=6f024da3288053359de36baf41fa095819d252cc;hb=HEAD#l1944 I'm now trying to reproduce this again, but I'm really not sure where to look and, as I wrote, last time I was unable to reproduce this despite running the testcase for hours. It might be interesting to see if there's a certain bordercase possible: GetOverlappedResult returning FALSE with bytes set to some non-0 value. Locally I changed the debug output to diff --git a/winsup/cygwin/fhandler.cc b/winsup/cygwin/fhandler.cc index 6f024da..556c240 100644 --- a/winsup/cygwin/fhandler.cc +++ b/winsup/cygwin/fhandler.cc @@ -1993,10 +1993,14 @@ fhandler_base_overlapped::wait_overlapped (bool inres, bool writing, DWORD *byte overridden by the return of GetOverlappedResult which could detect that I/O completion occurred. */ CancelIo (h); + *bytes = 0; wores = GetOverlappedResult (h, get_overlapped (), bytes, false); err = GetLastError (); ResetEvent (get_overlapped ()->hEvent); /* Probably not needed but CYA */ - debug_printf ("wfres %u, wores %d, bytes %u", wfres, wores, *bytes); + if (!wores && *bytes > 0) + system_printf ("wfres %u, wores %d, bytes %u", wfres, wores, *bytes); + else + debug_printf ("wfres %u, wores %d, bytes %u", wfres, wores, *bytes); if (wores) res = overlapped_success; /* operation succeeded */ else if (wfres == WAIT_OBJECT_0 + 1) I'm running the test with this change now. If you see anything else wrong in that code, please yell. Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Maintainer cygwin AT cygwin DOT com Red Hat
pgpAGHJOoL4Eo.pgp
Description: PGP signature