Greetings all, I'm trying to implement an IPsec tunnel from my LAN to a dedicated box. I've met with a common issue where some TCP packets cannot be fragmented due to a DF flag is set, and the packet is unable to pass through a tunnel. In that case an informing icmp packet is sent to the destination; the problem is that some sites block such packets. In the result, tcp session stalls. Some details about my setup: OpenBSD 4.1 WRAP box with a kernel PPPoE connection doing NAT. The remote box is a OpenBSD 4.0 machine with a vr(4) nic in some datacenter. My ipsec.conf is very simple and uses sane, secure defaults:
local ipsec.conf: ike dynamic esp from { 10.10.10.0/29, pppoe } to any peer xx.xx.xx.xx srcid fw.xxxxxx.com flow esp from { 10.10.10.0/29, 10.10.11.0/28 } to { 10.10.10.0/29, 10.10.11.0/28 } type bypass remote ipsec.conf: ike passive esp from any to any srcid vpngw.x96.org So, some TCP sessions still stall. I've tried multiple combinations of scrub directive; had to decrease max-mss and such, still would see stalling tcp sessions. So I came up with a test that would check the maximum size of a packet that can pass through a tunnel using ping's -s to set a size of a payload of icmp echo request packet. The test has shown that the maximum payload is 1330 bytes (-s 1331 would not go through). Add 8B ICMP header, 20B IP header make it 1358B total. Since regularly TCP header is 12 bytes larger than an ICMP header, It looks like I'd have to set a max-mss to 1318 for most tcp sessions to work fine. Then I tried the same test without the tunnel and got a result of 1464B icmp payload. The conclusion is that there is a 134 bytes overhead for IPsec tunnel, that includes a 20B new IP header, 8B ESP header and who knows how large an optional ESP trailer. The only assumption I make for this test to work is that icmp echo request packet is not fragmented. Correct me if I'm wrong please. I should probably try out scapy to create a DF tcp packet using similar logic to test the max size to get more assuring results. Anyway, it seems that this overhead is quite large, ~10% of the largest packet. Anyone could comment on this? I would appreciate any comments or suggestions on how to improve this setup. My current scrub directive on remote box is: scrub on $ext_if no-df max-mss 1318 Like I said, some TCP sessions still stall, could that be caused by a rare enlarged TCP packet with Options field being set? ;-)