On 14/05/2019 01:27, Wietse Venema wrote: > Wietse Venema: >> If you look at the non-VPN captures, then you will see the following: >> >> - In one trace, we see a client ACK 138, followed by a client packet >> with ".<CR><LF>" (data 443:446, ACK 138, and a timestamp field >> tht is unlike those of al other packets in the stream). >> >> - In the other trace, we see that the ACK and ".<CR><LF>" packets >> are sent as one packet, with a normal timestamp field. >> >> - After this, the TCP connection is messed up, the server keeps >> transmitting "Queued as xxx", and the client keeps transmitting >> QUIT. >> >> This looks like the VPN mucks with TCP and screws up the protocol. > I verified this another time. On the SMTP client side, there is one packet, > and on the SMTP server side there are two. > > Here is a diff -U output for client-side and server-side pcap files. > > -TIMESTAMP IP 10.110.2.9.33256 > 10.1.3.134.smtp: Flags [P.], seq 443:446, > ack 1 > 38, win 211, options [nop,nop,TS val 2968348176 ecr 687501956], length 3: > SMTP: > . > +TIMESTAMP IP 10.110.2.9.33256 > 10.1.3.134.smtp: Flags [.], ack 138, win > 211, o > ptions [nop,nop,TS val 2968348176 ecr 687501956], length 0 > +TIMESTAMP IP 10.110.2.9.33256 > 10.1.3.134.smtp: Flags [P.], seq 443:446, > ack 138, win 211, options [nop,nop,TS val 1 ecr 0], length 3: SMTP: . > > On the client side we see that the client sends one TCP packet with > three data bytes (offset 443:446, content ".<CR><LF>"), an ACK for > server offset 138, and Timestamp Value 2968348176 and Timestamp > Echo Reply 687501956. > > On the server side we see that the server receives two TCP packets. > > - One packet with the same ACK for server offset 138, and the same > Timestamp Value 2968348176 and Timestamp Echo Reply 687501956, as > the client sent, but no data bytes. this looks like a duplicate ACK that was generated on the server side VPN node. > - One packet with the same ACK for server offset 138, and the same > three data bytes as the client sent, but with a Timestamp Value of > 1 and a Timestamp Echo Reply of 0.
The TSval of 1 was generated on the server side VPN node. Seems that the packet is accepted anyway on the server because we can see that Postfix responds to that. When the next packet containing the QUIT command comes throught the TSVal has jumped back to the original clock sequence (as generated on the client) 2968348359. I can't see why the Linux kernel would reject that jump, but maybe it's happening on a lower level. I'd suggest looking into potential bugs in the tcp/ip stack of the vpn node when it generates a TSval of 1 on that packet. Another possibility would be to see if turning off tcp timestamps on the client solves it (systctl net.ipv4.tcp_timestamps=0). I'm not sure I'd leave it that way, but could be useful to test it. > > And from here on TCP appears to be stuck. The server repeatedly > sends "Ok: queued as 58CE8C0058<CR><LF>" and the client repeatedly > sends "QUIT<CR><LF>". > > Anyway, something is breaking TCP and it isn't Postfix. Maybe the > VPN is trying to be too smart, maybe something in your TCP stack > is trying to shape traffic and messing up. > > Wietse