On Fri, Jan 11, 2019 at 4:29 PM Willem de Bruijn <willemdebruijn.ker...@gmail.com> wrote: > > On Fri, Jan 11, 2019 at 9:44 AM Nicolas Dichtel > <nicolas.dich...@6wind.com> wrote: > > > > Since commit cb9f1b783850, scapy (which uses an AF_PACKET socket in > > SOCK_RAW mode) is unable to send a basic icmp packet over a sit tunnel: > > > > Here is a example of the setup: > > $ ip link set ntfp2 up > > $ ip addr add 10.125.0.1/24 dev ntfp2 > > $ ip tunnel add tun1 mode sit ttl 64 local 10.125.0.1 remote 10.125.0.2 dev > > ntfp2 > > $ ip addr add fd00:cafe:cafe::1/128 dev tun1 > > $ ip link set dev tun1 up > > $ ip route add fd00:200::/64 dev tun1 > > $ scapy > > >>> p = [] > > >>> p += IPv6(src='fd00:100::1', dst='fd00:200::1')/ICMPv6EchoRequest() > > >>> send(p, count=1, inter=0.1) > > >>> quit() > > $ ip -s link ls dev tun1 | grep -A1 "TX.*errors" > > TX: bytes packets errors dropped carrier collsns > > 0 0 1 0 0 0 > > > > The problem is that the network offset is set to the hard_header_len of the > > output device (tun1, ie 14 + 20) and in our case, because the packet is > > small (48 bytes) the pskb_inet_may_pull() fails (it tries to pull 40 bytes > > (ipv6 header) starting from the network offset). Let's reset the network > > offset so that it points to the beginning of the L3 data, ie skb->data. > > This only just landed in my inbox, sorry for the delay. So far I'm not > reproducing the issue, but I'm trying and am having a look. > > Which pskb_inet_may_pull() is failing?
Okay. I see what's going on. Commit cb9f1b783850 converted a pskb_may_pull ipv6hdr, pulling from skb->data, to pskb_network_may_pull, pulling from skb->network_header. Normally packet sockets with SOCK_RAW write a fixed size hardware header and place skb->network_header directly after that. Dropping these in sit_tunnel_xmit if they fail pskb_inet_may_pull is correct. But with devices with variable length hardware headers like sit, it is possible to cook packets that are shorter than the upper bound hhlen. In that case, we have no way of knowing where the variable length header ends (short of device specific parsing if implemented, see also header_ops.validate), so we set the network header to the same as the mac header as of commit 993675a3100b1. This is not a good thing to do in general, and not needed in the common case. The issue here is that the icmpv6 packet (48B) exceeds the sit hhlen (34), so is not caught by that workaround. We need to extend it to cases where the variable length hhlen is less than its upper bound in such a way that the packet as a whole exceeds hhlen, but by so little that the L3 header still starts before dev->hard_header_len. The simplest, to handle ip validation, is to extend the check from if (len < reserve) skb_reset_network_header(skb); to one that accounts for requiring a fully formed protocol header } else if (reserve) { + int l3_min_len = reserve; + skb_reserve(skb, -reserve); - if (len < reserve) + + if (proto == htons(ETH_P_IPV6)) + l3_min_len += sizeof(struct ipv6hdr); + else if (proto == htons(ETH_P_IP)) + l3_min_len += sizeof(struct iphdr); + + if (len < l3_min_len) skb_reset_network_header(skb); } Instead of special casing protocols, it might be cleaner to convert dev_validate_header to return instead of a pass/fail bool the hardware header length. And then update skb->network_header accordingly. That would definitely only be for net-next. For completeness, my variant of your test on top of v5.0-rc1 " ip netns add test ip netns exec test bash ip link set lo up ip addr add 10.0.0.1/24 dev lo ip tunnel add tun1 mode sit ttl 64 local 10.0.0.1 remote 10.0.0.2 dev lo ip addr add fd00:cafe:cafe::1/128 dev tun1 ip link set dev tun1 up ip route add fd00:200::/64 dev tun1 tcpdump -vvv -i tun1 -n & sleep 0.4 scapy p = [] p += IPv6(src='fd00:100::1', dst='fd00:200::1')/ICMPv6EchoRequest() send(p, count=1, inter=0.1) quit() "