John, Wow, good one dude! Turning off tcp timestamps on the client instantly fixed it. So I guess I bring this to Checkpoint Support and see what they say, THAT should be fun.
Thanks! On Tue, May 14, 2019 at 3:02 AM John Fawcett <j...@voipsupport.it> wrote: > On 14/05/2019 01:27, Wietse Venema wrote: > > Wietse Venema: > >> If you look at the non-VPN captures, then you will see the following: > >> > >> - In one trace, we see a client ACK 138, followed by a client packet > >> with ".<CR><LF>" (data 443:446, ACK 138, and a timestamp field > >> tht is unlike those of al other packets in the stream). > >> > >> - In the other trace, we see that the ACK and ".<CR><LF>" packets > >> are sent as one packet, with a normal timestamp field. > >> > >> - After this, the TCP connection is messed up, the server keeps > >> transmitting "Queued as xxx", and the client keeps transmitting > >> QUIT. > >> > >> This looks like the VPN mucks with TCP and screws up the protocol. > > I verified this another time. On the SMTP client side, there is one > packet, > > and on the SMTP server side there are two. > > > > Here is a diff -U output for client-side and server-side pcap files. > > > > -TIMESTAMP IP 10.110.2.9.33256 > 10.1.3.134.smtp: Flags [P.], seq > 443:446, ack 1 > > 38, win 211, options [nop,nop,TS val 2968348176 ecr 687501956], length > 3: SMTP: > > . > > +TIMESTAMP IP 10.110.2.9.33256 > 10.1.3.134.smtp: Flags [.], ack 138, > win 211, o > > ptions [nop,nop,TS val 2968348176 ecr 687501956], length 0 > > +TIMESTAMP IP 10.110.2.9.33256 > 10.1.3.134.smtp: Flags [P.], seq > 443:446, ack 138, win 211, options [nop,nop,TS val 1 ecr 0], length 3: > SMTP: . > > > > On the client side we see that the client sends one TCP packet with > > three data bytes (offset 443:446, content ".<CR><LF>"), an ACK for > > server offset 138, and Timestamp Value 2968348176 and Timestamp > > Echo Reply 687501956. > > > > On the server side we see that the server receives two TCP packets. > > > > - One packet with the same ACK for server offset 138, and the same > > Timestamp Value 2968348176 and Timestamp Echo Reply 687501956, as > > the client sent, but no data bytes. > this looks like a duplicate ACK that was generated on the server side > VPN node. > > - One packet with the same ACK for server offset 138, and the same > > three data bytes as the client sent, but with a Timestamp Value of > > 1 and a Timestamp Echo Reply of 0. > > The TSval of 1 was generated on the server side VPN node. Seems that the > packet is accepted anyway on the server because we can see that Postfix > responds to that. When the next packet containing the QUIT command comes > throught the TSVal has jumped back to the original clock sequence (as > generated on the client) 2968348359. I can't see why the Linux kernel > would reject that jump, but maybe it's happening on a lower level. > > I'd suggest looking into potential bugs in the tcp/ip stack of the vpn > node when it generates a TSval of 1 on that packet. > > Another possibility would be to see if turning off tcp timestamps on the > client solves it (systctl net.ipv4.tcp_timestamps=0). I'm not sure I'd > leave it that way, but could be useful to test it. > > > > > And from here on TCP appears to be stuck. The server repeatedly > > sends "Ok: queued as 58CE8C0058<CR><LF>" and the client repeatedly > > sends "QUIT<CR><LF>". > > > > Anyway, something is breaking TCP and it isn't Postfix. Maybe the > > VPN is trying to be too smart, maybe something in your TCP stack > > is trying to shape traffic and messing up. > > > > Wietse > > >