Hi Rainer,

On Feb 17 20:37, Rainer Emrich via Cygwin wrote:
> Am 17.02.2025 um 18:00 schrieb Corinna Vinschen via Cygwin:
> > On Feb 17 12:51, Rainer Emrich via Cygwin wrote:
> > > I'm facing a strange major issue with scp. The issue exists in all cygwin 
> > > version later than 3.5.3,
> > > including cygwin-3.6.0-0.374.g4dd859d01c22.
> > > 
> > > If I'm copying a large file with scp I get a "connection lost" after a 
> > > random couple of seconds:
> > > 
> > > scp -v large_file foobar:
> > > .
> > > .
> > > debug1: Sending subsystem: sftp
> > > debug1: pledge: fork
> > > large_file                                             10%   71MB   
> > > 4.3MB/s   02:21 ETA
> > > debug1: client_input_channel_req: channel 0 rtype exit-status reply 0
> > > debug1: client_input_channel_req: channel 0 rtype e...@openssh.com reply 0
> > > debug1: channel 0: free: client-session, nchannels 1
> > > Transferred: sent 92266460, received 35436 bytes, in 15.3 seconds
> > > Bytes per second: sent 6035219.0, received 2317.9
> > > debug1: Exit status 11
> > > lost connection
> > 
> > In fact, I can reproduce this occassionally back to 3.5.0 and back to
> > OpenSSH 9.7p1.  We can't easily try this with older Cygwin versions.
> > It's getting increasingly hard to build older Cygwin versions due to
> > compiler dependencies and missing symbols.
> at least for my file size, around 700MB, I can't reproduce this with cygwin
> 3.5.3. I noticed this issue for the first time in the autumn last year.
> 
> > What that means in the first place, is that this is neither a regression
> > from 3.5.7, nor even from 3.5.1.  Obviously I can't prove if this has
> > been introduced into 3.5.0, but I'd like to point out that we didn't
> > have any noticable change in the socket code for almost 4 years, back
> > during 3.3 development.
> > 
> > Fun fact: I can NOT reproduce the problem when using the -O option,
> > i. e., when using the old scp protocol.  The old protocol isn't
> > slower either.
> > 
> > Maybe that's a workaround for you?
> 
> I try this, thanks.

I'm debugging this problem on and off for the last couple of days,
and even discussing it with one of the upstream OpenSSH maintainers.

But it's still a mystery to me.  The "lost connection" message does not
really point to the cause of the problem, it's just a followup effect:

The server receives an EPIPE on the read socket, which in turn
results in the clientside ssh to receive an "end-of-write" packet from
the server, which in turn results in ssh closing the pipe to scp, which
in turn prints the "lost connection" message.

The only thing I can say so far is that it appears to be signal related.

Fact is, that scp usually runs a SIGALRM triggered progressmeter.  If
you disable the progressmeter by running scp with the -q option, you
can avoid the "lost connection" as well, you don't have to ron scp -O.

> > > The strange thing, if I use strace to debug this, the cpoy succeeds:
> > > strace -efno strace.log scp -v large_file foobar:
> > 
> > This often points to a timing issue, but beats me where that could be.
> > 
> > > I would try to debug this further, if I had an idea how to do that.
> > 
> > Same here ATM, sorry.
> 
> That's really strange.

Yeah, I know.  But it's really tricky.  All my debugging so far only
turned up followup effects, not the actual cause.  Sigh.


Corinna

-- 
Problem reports:      https://cygwin.com/problems.html
FAQ:                  https://cygwin.com/faq/
Documentation:        https://cygwin.com/docs.html
Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple

Reply via email to