Handle weird ICMP fragmentation needed messages with next hop MTU equal to (or exceeding) dropped packet size
Fixes: 46517008e116 ("ipv4: Kill ip_rt_frag_needed().") In a large corporate network, we spotted this weird ICMP message after a long troubleshooting. See attached capture file. Those ICMP "network unreachable - fragmentation needed and don't fragment bit set" messages are sent by a router that drop 1500 bytes IP packets and fill the next hop MTU ICMP field with 1500. Those messages cause the TCP connection to stall but only on newer kernels. Older kernels set path MTU to 1492 and communicates successfully. After checking code and commit history, I spotted how commit 46517008e116 ("ipv4: Kill ip_rt_frag_needed().") from June 2012 changed ICMP messages handling by removing ip_rt_frag_needed function. The relevant part of the ip_rt_frag_needed function that was removed is: if (new_mtu < 68 || new_mtu >= old_mtu) { /* BSD 4.2 derived systems incorrectly adjust * tot_len by the IP header length, and report * a zero MTU in the ICMP message. */ if (mtu == 0 && old_mtu >= 68 + (iph->ihl << 2)) old_mtu -= iph->ihl << 2; mtu = guess_mtu(old_mtu); } This condition handled the cases when next hop MTU where zero (less than 68). Now this is handled by the protocol and fixed by commit 68b7107b6298 "ipv4: icmp: Fix pMTU handling for rare case". But the rarest case when (next hop MTU) new_mtu >= old_mtu (dropped packet length) was also removed. This commit restores this check. Instead of using a table lookup like function guess_mtu uses, it just try to set the path MTU decrementing by 2 bytes the dropped packet size. In our case, setting the path MTU to just 1498 (one iteration) worked. This solution should converge in any case to a good value by small steps. I don't think there's a need to a more complex solution. The patched kernel worked perfectly setting the path MTU to 1498 from the initial default interface value of 1500. This time I don't have a capture file from inside the affected center, but all received packed had a maximum size of 1498. -- cheers vicente
ICMP discarting and sugesting 1500 2.pcapng
Description: application/pcapng
From bfc9a00e6b78d8eb60e46dacd7d761669d29a573 Mon Sep 17 00:00:00 2001 From: Vicente Jimenez Aguilar <goo...@gmail.com> Date: Mon, 31 Oct 2016 13:10:29 +0100 Subject: [PATCH] ipv4: icmp: Fix pMTU handling for rarest case Restore network resistance to weird ICMP fragmentation needed messages with next hop MTU equal to (or exceeding) dropped packet size Fixes: 46517008e116 ("ipv4: Kill ip_rt_frag_needed().") Signed-off-by: Vicente Jimenez Aguilar <goo...@gmail.com> --- net/ipv4/icmp.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c index 38abe70..c0af1d2 100644 --- a/net/ipv4/icmp.c +++ b/net/ipv4/icmp.c @@ -776,6 +776,7 @@ static bool icmp_unreach(struct sk_buff *skb) struct icmphdr *icmph; struct net *net; u32 info = 0; + unsigned short old_mtu; net = dev_net(skb_dst(skb)->dev); @@ -819,6 +820,12 @@ static bool icmp_unreach(struct sk_buff *skb) /* fall through */ case 0: info = ntohs(icmph->un.frag.mtu); + /* Handle weird case where next hop MTU is + * equal to or exceeding dropped packet size + */ + old_mtu = ntohs(iph->tot_len); + if ( info >= old_mtu ) + info = old_mtu - 2; } break; case ICMP_SR_FAILED: -- 2.7.3