Chinh Nguyen wrote: > Patrick McHardy wrote: > >>Netfilter recalculates the checksum when NATing it. > > > The NATing is not done by netfilter but by the NAT device between the IPsec > peers.
I see, so the TCP checksum includes the wrong IPs. > [Linux ipsec client C] ------ [NAT device] ---------- [Linux ipsec server S] > > C negotiates a IPsec Transport Mode with S. Because of Transport Mode/NAT-T, 2 > things happen to an IPsec packet. > > 1. It is UDP-encapsulated, typically on port 4500/udp. > 2. Transport Mode traffic leaves the original IP header alone whereas tunnel > mode wraps the entire traffic in a second IP header. As such, when the packet > passes through the NAT device, the source IP is N. However, the original > unencrypted packet had source IP C. > > S rips off the UDP-encap header, decrypts the payload, and "joins" the content > back to the IP header. If the decrypted content is UDP or TCP, the UDP/TCP > checksum is now incorrect because the source IP is now N not C. > > (In tunnel mode, we would ignore the NAT-ted outer IP header because the > decrypted content has an entire IP header + UDP/TCP etc) > > This is a well-known problem with transport mode/NAT. One solution is to use > NAT-OA and NAT-OR to recalculate the checksum. The linux kernel does the > simpler > thing of ignoring the UDP/TCP checksum altogether in this particular case: > > function esp_post_input (net/ipv4/esp4.c) > 290 /* > 291 * 2) ignore UDP/TCP checksums in case > 292 * of NAT-T in Transport Mode, or > 293 * perform other post-processing fixes > 294 * as per * draft-ietf-ipsec-udp-encaps-06, > 295 * section 3.1.2 > 296 */ > 297 if (!x->props.mode) > 298 skb->ip_summed = CHECKSUM_UNNECESSARY; > 299 > 300 break; > > > As noted, esp_post_input is called in xfrm4_policy_check. Decrypted UDP > traffic > through transport mode/nat also has bad checksums. However, since it is passed > through udp_queue_rcv_skb after decryption, and this function calls > xfrm4_policy_check before checking the UDP checksum, line 298 means the kernel > ignores the bad checksum. > > Decrypted TCP traffic has bad checksums too. But since tcp_v4_rcv checks the > TCP > checksum before calling xfrm4_policy_check, the bad checksum means the TCP > packet is dropped as a bad segment. > > The end result is that UDP and other traffic (eg, ICMP) can pass through > transport mode/nat but not TCP. > > I don't know what correct fix is. Adding an extra call to xfrm4_policy_check > in > tcp_v4_rcv before the checksum check fixes this problem and doesn't seem to > break anything else. On the other hand, moving some of the code in > esp_post_input into esp_input (especially line 298) will work, too. So we could move checksum validation behind xfrm4_policy_check or already set ip_summed to CHECKSUM_UNNECESSARY in esp_input. Already setting ip_summed in esp4_input looks easier. But this still leaves one problem. With netfilter and local NAT, a decapsulated transport mode packet might be forwarded to another host. In that case the checksum contained in the packet is invalid. Any ideas how to fix this anyone? - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html