On 07/21/2016 09:13 PM, Steffen Klassert wrote:
> Hi Matt,
>
> I've did some vti tests the last days, but I was unable to
> reproduce it.
>
> On Tue, Jul 19, 2016 at 05:49:06AM +0000, Matt Bennett wrote:
>> On 07/05/2016 03:55 PM, Matt Bennett wrote:
>>> On 07/04/2016 11:12 PM, Steffen Klassert wrote:
>>>> On Mon, Jul 04, 2016 at 03:52:50AM +0000, Matt Bennett wrote:
>>>>> *Resending as plain text so the mailing list accepts it.. Sorry Steffen 
>>>>> and Herbert*
>>>>>
>>>>> ​Hi,
>>>>>
>>>>> During long run testing of an ipsec tunnel over a PPP link it was found 
>>>>> that occasionally traffic would stop flowing over the tunnel. Eventually 
>>>>> the traffic would start again, however using the command "ip route flush 
>>>>> cache" causes traffic to start flowing  again immediately.
>
> Do you need the ppp link to reproduce it? How often does that happen?
> It would be good to find a minimal setup with that the bug is reproducible.
>
>
Our original tests were long run, i.e. we set traffic flowing across the tunnel 
and noticed occasionally the throughput would drop significantly. Based on my 
reproduction method I believe the ppp link may be required.

To reproduce this I have 2 devices:

Device 1:
ppp0 - 203.0.113.10/32 (mtu 1492)
16778240: ppp0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1492 qdisc htb 
state UP mode DEFAULT group default qlen 3
     link/ppp

tunnel64 - 172.16.0.6/30 (mtu 1200) - note this is a VTI with IPSEC protection
14: tunnel64@NONE: <POINTOPOINT,MULTICAST,UP,LOWER_UP> mtu 1200 qdisc htb state 
UNKNOWN mode DEFAULT group default qlen 1
     link/ipip 203.0.113.10 peer 203.0.113.5

Device 2:
ppp1 - 203.0.113.5/32 (mtu 1492)
16778241: ppp1: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1492 qdisc 
pfifo_fast state UP mode DEFAULT group default qlen 3
     link/ppp

tunnel64 - 172.16.0.5/30 (mtu 1200) - note this is a VTI with IPSEC protection
20: tunnel64@NONE: <POINTOPOINT,MULTICAST,UP,LOWER_UP> mtu 1200 qdisc noqueue 
state UNKNOWN mode DEFAULT group default qlen 1
     link/ipip 203.0.113.5 peer 203.0.113.10

I run generated traffic with size of 1300 bytes across the tunnel (which 
obviously fragments the packets). Then I bring ppp1 on device 2 DOWN then back 
UP.

At this stage on device 1 I have printk debug in the function ip_fragment(), 
the unlikely block is hit:

if (unlikely(!skb->ignore_df ||
                     (IPCB(skb)->frag_max_size &&
                      IPCB(skb)->frag_max_size > mtu))) {
                                        printk (KERN_ERR "mtu = %u, dev = %s, 
src = %u, dst = %u, tot_len = %u\n", mtu, skb->dev->name, iph->saddr, 
iph->daddr, iph->tot_len);
                                        printk (KERN_ERR "!skb->ignore_df = %u, 
IPCB(skb)->frag_max_size = %u\n", !skb->ignore_df, IPCB(skb)->frag_max_size);
                                        icmp_send(skb, ICMP_DEST_UNREACH, 
ICMP_FRAG_NEEDED,
                                                htonl(mtu));
                                        kfree_skb(skb);
                                        return -EMSGSIZE;
        }

which prints:
mtu = 1200, dev = tunnel64, src = 3405803786, dst = 3405803781, tot_len = 1244
!skb->ignore_df = 1, IPCB(skb)->frag_max_size = 0

Note the src and dst IP of the packet is: src=203.0.113.10, dst=203.0.113.5 
(the tunnel is trying to send the PPP packet ???)

Interestingly I also have debug in icmp_unreach(), which actions the 
ICMP_DEST_UNREACH sent from the tunnel:

case ICMP_FRAG_NEEDED:
                        /* for documentation of the ip_no_pmtu_disc
                         * values please see
                         * Documentation/networking/ip-sysctl.txt
                         */
                        switch (net->ipv4.sysctl_ip_no_pmtu_disc) {
...
                        case 0:
                                info = ntohs(icmph->un.frag.mtu);
                                printk (KERN_ERR "mtu = %u, dev = %s, src = %u, 
dst = %u, tot_len = %u\n", info, skb->dev->name, iph->saddr, iph->daddr, 
iph->tot_len);
                        }

which prints:
mtu = 1200, dev = lo, src = 3405803786, dst = 3405803781, tot_len = 1244

I am confused at this stage (the packet is sent from the loopback interface and 
routed out the tunnel64?)

The code then eventually reaches vti4_err() which updates the pmtu on the ppp0 
interface to 1200.

Then the code in xfrm_bundle_ok() which I mentioned in an earlier email is hit 
which continuously drops the MTU on the tunnel. However I believe the behaviour 
I outlined above is the root cause and this is just a side effect.







   



Reply via email to