Pascal Hambourg <pas...@plouf.fr.eu.org> writes: > Hello, > > Nikolaus Rath a écrit : >> >> I'm having trouble with an internet connection that seems to randomly >> "freeze" arbitrary tcp connections when they have not been used for a >> while. The connections stay established, but no data is coming through. > > How long is "a while", at a minimum ?
I wrote a small test program. It seems to be exactly 302 seconds, 301 still works. >> When this happens, netstat still shows the connection status as >> `ESTABLISHED` on both the local computer: >> >> Proto Recv-Q Send-Q Local Address Foreign Address >> State PID/Program name Timer >> tcp 0 53 192.168.0.10:41129 173.255.235.238:143 >> ESTABLISHED 8219/gnutls-cli on (79.31/13/0) >> >> ..and the remote server: >> >> Proto Recv-Q Send-Q Local Address Foreign Address >> State PID/Program name Timer >> tcp 0 0 173.255.235.238:143 68.5.174.98:41129 >> ESTABLISHED 5303/imapd off (0.00/0/0) > > It appears that the client has a private addresse and the server has a > public address. So I guess that there is a NAT device between them, and > its stateful NAT engine may be the cause of the problem, by deleting > connections from its translation table after a delay of inactivity. > >> When I look at a packet capture of this connection on the client side, >> there is a long (expected) period of inactivity that seems to trigger >> the problem, then the local end tries to transmit some data again but >> never receives an ACK. Instead, 15 TCP Retransmissions go out, with >> intervals increasing from 0.3 seconds to 120 seconds. No activity is >> captured after that. > > Can you do a packet capture on the server side well ? Yes, just tried it. The server does not receive anything at all when the client starts retransmitting. I guess that is consistent with the NAT explanation? >> Does anyone have a suggestion of how I could debug this further to find >> out where the problem lies and how to fix it? >> >> Also, is is there some way to globally reduce the timeout on client >> and/or server to reduce the time before the local application aborts? > > The Linux kernel supports system-wide TCP keepalive. However the > application must enable it on a per-socket basis, and the minimum > recommended value of 2 hours (which is the default in Linux) is quite > high, the inactivity timeout in your NAT device may be shorter. The best > workaround for this is to generate traffic with some kind of > application-level keepalive, either defined in the application protocol > such as in SSH, of by periodically sending dummy commands or data. Yes, I guess your NAT theory makes sense. If I use ssh with "ServerAliveInterval", or force libkeepalive use with LD_PRELOAD, the connections survive beyond 302 seconds. However, unfortunately this isn't a good solution, because I have non-Linux devices in the same network that suffer from the same problem. Is there a way to figure out at which device the NAT timeout happens? I have a Cisco DPC3825 cable modem that does NAT. But it has just 4 Ethernet connections and WLAN, so I have a hard time believing that it would need to force a 5 min timeout. The web administration page also doesn't mention any timeouts (which may of course mean nothing). Is it possible that there's a second NAT at work behind the modem? Thanks, -Nikolaus -- »Time flies like an arrow, fruit flies like a Banana.« PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6 02CF A9AD B7F8 AE4E 425C -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/87ip80c6hl....@vostro.rath.org