On Thu, 2016-10-13 at 21:47 -0700, David Ahern wrote:
> Currently, socket lookups for l3mdev (vrf) use cases can match a socket
> that is bound to a port but not a device (ie., a global socket). If the
> sysctl tcp_l3mdev_accept is not set this leads to ack packets going out
> based on the main table even though the packet came in from an L3 domain.
> The end result is that the connection does not establish creating
> confusion for users since the service is running and a socket shows in
> ss output. Fix by requiring an exact dif to sk_bound_dev_if match if the
> skb came through an interface enslaved to an l3mdev device and the
> tcp_l3mdev_accept is not set.
> 
> skb's through an l3mdev interface are marked by setting a flag in the
> inet{6}_skb_parm. The IPv6 variant is already set. This patch adds the
> flag for IPv4. Marking the skb avoids a device lookup on the dif.
> 
> The inet_skb_parm struct currently has a 1-byte hold following the
> flags, so the flags is expanded to u16 without increasing the size of
> the struct. This is needed to add another flag.
> 
> Fixes: 193125dbd8eb ("net: Introduce VRF device driver")
> Signed-off-by: David Ahern <d...@cumulusnetworks.com>
> ---
> v2
> - reordered the checks in inet_exact_dif_match per Eric's comment
> - changed the l3mdev determination from looking up the dif to using
>   a flag set on the skb which is much faster
> 
>  drivers/net/vrf.c           |  2 ++
>  include/linux/ipv6.h        | 10 ++++++++++
>  include/net/ip.h            | 13 ++++++++++++-
>  net/ipv4/inet_hashtables.c  |  7 ++++---
>  net/ipv6/inet6_hashtables.c |  7 ++++---
>  5 files changed, 32 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
> index 85c271c70d42..820de6a9ddde 100644
> --- a/drivers/net/vrf.c
> +++ b/drivers/net/vrf.c
> @@ -956,6 +956,7 @@ static struct sk_buff *vrf_ip6_rcv(struct net_device 
> *vrf_dev,
>       if (skb->pkt_type == PACKET_LOOPBACK) {
>               skb->dev = vrf_dev;
>               skb->skb_iif = vrf_dev->ifindex;
> +             IP6CB(skb)->flags |= IP6SKB_L3SLAVE;
>               skb->pkt_type = PACKET_HOST;
>               goto out;
>       }
> @@ -996,6 +997,7 @@ static struct sk_buff *vrf_ip_rcv(struct net_device 
> *vrf_dev,
>  {
>       skb->dev = vrf_dev;
>       skb->skb_iif = vrf_dev->ifindex;
> +     IPCB(skb)->flags |= IPSKB_L3SLAVE;
>  
>       /* loopback traffic; do not push through packet taps again.
>        * Reset pkt_type for upper layers to process skb
> diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
> index 7e9a789be5e0..dd3bb34aac8b 100644
> --- a/include/linux/ipv6.h
> +++ b/include/linux/ipv6.h
> @@ -144,6 +144,16 @@ static inline int inet6_iif(const struct sk_buff *skb)
>       return l3_slave ? skb->skb_iif : IP6CB(skb)->iif;
>  }
>  
> +static inline bool inet6_exact_dif_match(struct net *net, struct sk_buff 
> *skb)
> +{
> +#ifdef CONFIG_NET_L3_MASTER_DEV
> +     if (!net->ipv4.sysctl_tcp_l3mdev_accept &&
> +         IP6CB(skb)->flags & IP6SKB_L3SLAVE)

There is a catch here.
TCP moves IP6CB() in a different location.

Reference :

971f10eca186 ("tcp: better TCP_SKB_CB layout to reduce cache line misses")


Problem is that the lookup can happen from IP early demux, before TCP
moved IP{6}CB around.

So you might need to let the caller pass IP6CB(skb)->flags (or
TCP_SKB_CB(skb)->header.h6.flags ) instead of skb since
inet6_exact_dif_match() does not know where to fetch the flags.

Same issue for IPv4.




Reply via email to