Pascal Hambourg <pas...@plouf.fr.eu.org> writes:
> Hello,
>
> Nikolaus Rath a écrit :
>> 
>> I'm having trouble with an internet connection that seems to randomly
>> "freeze" arbitrary tcp connections when they have not been used for a
>> while. The connections stay established, but no data is coming through.
>
> How long is "a while", at a minimum ?

I wrote a small test program. It seems to be exactly 302 seconds, 301
still works.

>> When this happens, netstat still shows the connection status as
>> `ESTABLISHED` on both the local computer:
>> 
>>     Proto Recv-Q Send-Q Local Address           Foreign Address         
>> State       PID/Program name Timer
>>     tcp        0     53 192.168.0.10:41129      173.255.235.238:143     
>> ESTABLISHED 8219/gnutls-cli  on (79.31/13/0)
>> 
>> ..and the remote server:
>> 
>>     Proto Recv-Q Send-Q Local Address           Foreign Address         
>> State       PID/Program name Timer
>>     tcp        0      0 173.255.235.238:143     68.5.174.98:41129       
>> ESTABLISHED 5303/imapd       off (0.00/0/0)
>
> It appears that the client has a private addresse and the server has a
> public address. So I guess that there is a NAT device between them, and
> its stateful NAT engine may be the cause of the problem, by deleting
> connections from its translation table after a delay of inactivity.
>
>> When I look at a packet capture of this connection on the client side,
>> there is a long (expected) period of inactivity that seems to trigger
>> the problem, then the local end tries to transmit some data again but
>> never receives an ACK. Instead, 15 TCP Retransmissions go out, with
>> intervals increasing from 0.3 seconds to 120 seconds. No activity is
>> captured after that.
>
> Can you do a packet capture on the server side well ?

Yes, just tried it. The server does not receive anything at all when the
client starts retransmitting. I guess that is consistent with the NAT
explanation?


>> Does anyone have a suggestion of how I could debug this further to find
>> out where the problem lies and how to fix it?
>> 
>> Also, is is there some way to globally reduce the timeout on client
>> and/or server to reduce the time before the local application aborts?
>
> The Linux kernel supports system-wide TCP keepalive. However the
> application must enable it on a per-socket basis, and the minimum
> recommended value of 2 hours (which is the default in Linux) is quite
> high, the inactivity timeout in your NAT device may be shorter. The best
> workaround for this is to generate traffic with some kind of
> application-level keepalive, either defined in the application protocol
> such as in SSH, of by periodically sending dummy commands or data.

Yes, I guess your NAT theory makes sense. If I use ssh with
"ServerAliveInterval", or force libkeepalive use with LD_PRELOAD, the
connections survive beyond 302 seconds.

However, unfortunately this isn't a good solution, because I have
non-Linux devices in the same network that suffer from the same problem.

Is there a way to figure out at which device the NAT timeout happens? I
have a Cisco DPC3825 cable modem that does NAT. But it has just 4
Ethernet connections and WLAN, so I have a hard time believing that it
would need to force a 5 min timeout. The web administration page also
doesn't mention any timeouts (which may of course mean nothing). Is it
possible that there's a second NAT at work behind the modem?


Thanks,

   -Nikolaus

-- 
 »Time flies like an arrow, fruit flies like a Banana.«

  PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6  02CF A9AD B7F8 AE4E 425C


--
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/87ip80c6hl....@vostro.rath.org

Reply via email to