On 07/21/2016 09:13 PM, Steffen Klassert wrote: > Hi Matt, > > I've did some vti tests the last days, but I was unable to > reproduce it. > > On Tue, Jul 19, 2016 at 05:49:06AM +0000, Matt Bennett wrote: >> On 07/05/2016 03:55 PM, Matt Bennett wrote: >>> On 07/04/2016 11:12 PM, Steffen Klassert wrote: >>>> On Mon, Jul 04, 2016 at 03:52:50AM +0000, Matt Bennett wrote: >>>>> *Resending as plain text so the mailing list accepts it.. Sorry Steffen >>>>> and Herbert* >>>>> >>>>> Hi, >>>>> >>>>> During long run testing of an ipsec tunnel over a PPP link it was found >>>>> that occasionally traffic would stop flowing over the tunnel. Eventually >>>>> the traffic would start again, however using the command "ip route flush >>>>> cache" causes traffic to start flowing again immediately. > > Do you need the ppp link to reproduce it? How often does that happen? > It would be good to find a minimal setup with that the bug is reproducible. > > Our original tests were long run, i.e. we set traffic flowing across the tunnel and noticed occasionally the throughput would drop significantly. Based on my reproduction method I believe the ppp link may be required.
To reproduce this I have 2 devices: Device 1: ppp0 - 203.0.113.10/32 (mtu 1492) 16778240: ppp0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1492 qdisc htb state UP mode DEFAULT group default qlen 3 link/ppp tunnel64 - 172.16.0.6/30 (mtu 1200) - note this is a VTI with IPSEC protection 14: tunnel64@NONE: <POINTOPOINT,MULTICAST,UP,LOWER_UP> mtu 1200 qdisc htb state UNKNOWN mode DEFAULT group default qlen 1 link/ipip 203.0.113.10 peer 203.0.113.5 Device 2: ppp1 - 203.0.113.5/32 (mtu 1492) 16778241: ppp1: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1492 qdisc pfifo_fast state UP mode DEFAULT group default qlen 3 link/ppp tunnel64 - 172.16.0.5/30 (mtu 1200) - note this is a VTI with IPSEC protection 20: tunnel64@NONE: <POINTOPOINT,MULTICAST,UP,LOWER_UP> mtu 1200 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1 link/ipip 203.0.113.5 peer 203.0.113.10 I run generated traffic with size of 1300 bytes across the tunnel (which obviously fragments the packets). Then I bring ppp1 on device 2 DOWN then back UP. At this stage on device 1 I have printk debug in the function ip_fragment(), the unlikely block is hit: if (unlikely(!skb->ignore_df || (IPCB(skb)->frag_max_size && IPCB(skb)->frag_max_size > mtu))) { printk (KERN_ERR "mtu = %u, dev = %s, src = %u, dst = %u, tot_len = %u\n", mtu, skb->dev->name, iph->saddr, iph->daddr, iph->tot_len); printk (KERN_ERR "!skb->ignore_df = %u, IPCB(skb)->frag_max_size = %u\n", !skb->ignore_df, IPCB(skb)->frag_max_size); icmp_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED, htonl(mtu)); kfree_skb(skb); return -EMSGSIZE; } which prints: mtu = 1200, dev = tunnel64, src = 3405803786, dst = 3405803781, tot_len = 1244 !skb->ignore_df = 1, IPCB(skb)->frag_max_size = 0 Note the src and dst IP of the packet is: src=203.0.113.10, dst=203.0.113.5 (the tunnel is trying to send the PPP packet ???) Interestingly I also have debug in icmp_unreach(), which actions the ICMP_DEST_UNREACH sent from the tunnel: case ICMP_FRAG_NEEDED: /* for documentation of the ip_no_pmtu_disc * values please see * Documentation/networking/ip-sysctl.txt */ switch (net->ipv4.sysctl_ip_no_pmtu_disc) { ... case 0: info = ntohs(icmph->un.frag.mtu); printk (KERN_ERR "mtu = %u, dev = %s, src = %u, dst = %u, tot_len = %u\n", info, skb->dev->name, iph->saddr, iph->daddr, iph->tot_len); } which prints: mtu = 1200, dev = lo, src = 3405803786, dst = 3405803781, tot_len = 1244 I am confused at this stage (the packet is sent from the loopback interface and routed out the tunnel64?) The code then eventually reaches vti4_err() which updates the pmtu on the ppp0 interface to 1200. Then the code in xfrm_bundle_ok() which I mentioned in an earlier email is hit which continuously drops the MTU on the tunnel. However I believe the behaviour I outlined above is the root cause and this is just a side effect.