On 11/20/18 7:23 AM, Alexis Bauvin wrote: > Creating a VXLAN device with is underlay in the non-default VRF makes > egress route lookup fail or incorrect since it will resolve in the > default VRF, and ingress fail because the socket listens in the default > VRF. > > This patch binds the underlying UDP tunnel socket to the l3mdev of the > lower device of the VXLAN device. This will listen in the proper VRF and > output traffic from said l3mdev, matching l3mdev routing rules and > looking up the correct routing table. > > When the VXLAN device does not have a lower device, or the lower device > is in the default VRF, the socket will not be bound to any interface, > keeping the previous behaviour. > > The underlay l3mdev is deduced from the VXLAN lower device > (IFLA_VXLAN_LINK). > > The l3mdev_master_upper_ifindex_by_index function has been added to > l3mdev. Its goal is to fetch the effective l3mdev of an interface which > is not a direct slave of said l3mdev. It handles the following example, > properly resolving the l3mdev of eth0 to vrf-blue: > > +----------+ +---------+ > | | | | > | vrf-blue | | vrf-red | > | | | | > +----+-----+ +----+----+ > | | > | | > +----+-----+ +----+----+ > | | | | > | br-blue | | br-red | > | | | | > +----+-----+ +---+-+---+ > | | | > | +-----+ +-----+ > | | | > +----+-----+ +------+----+ +----+----+ > | | lower device | | | | > | eth0 | <- - - - - - - | vxlan-red | | tap-red | (... more taps) > | | | | | | > +----------+ +-----------+ +---------+
same here. Very helpful diagram. > diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c > index 27bd586b94b0..a3de08122269 100644 > --- a/drivers/net/vxlan.c > +++ b/drivers/net/vxlan.c The vxlan changes look ok to me. It would be good for someone who know that code better than I do to review it. Move the following l3mdev changes to a separate patch - introduce infra changes separate from their use: > diff --git a/include/net/l3mdev.h b/include/net/l3mdev.h > index 3832099289c5..78fa0ac4613c 100644 > --- a/include/net/l3mdev.h > +++ b/include/net/l3mdev.h > @@ -101,6 +101,17 @@ struct net_device *l3mdev_master_dev_rcu(const struct > net_device *_dev) > return master; > } > > +int l3mdev_master_upper_ifindex_by_index_rcu(struct net *net, int ifindex); > +static inline > +int l3mdev_master_upper_ifindex_by_index(struct net *net, int ifindex) > +{ > + rcu_read_lock(); > + ifindex = l3mdev_master_upper_ifindex_by_index_rcu(net, ifindex); > + rcu_read_unlock(); > + > + return ifindex; > +} > + > u32 l3mdev_fib_table_rcu(const struct net_device *dev); > u32 l3mdev_fib_table_by_index(struct net *net, int ifindex); > static inline u32 l3mdev_fib_table(const struct net_device *dev) > @@ -207,6 +218,17 @@ static inline int l3mdev_master_ifindex_by_index(struct > net *net, int ifindex) > return 0; > } > > +static inline > +int l3mdev_master_upper_ifindex_by_index_rcu(struct net *net, int ifindex) > +{ > + return 0; > +} > +static inline > +int l3mdev_master_upper_ifindex_by_index(struct net *net, int ifindex) > +{ > + return 0; > +} > + > static inline > struct net_device *l3mdev_master_dev_rcu(const struct net_device *dev) > { > diff --git a/net/l3mdev/l3mdev.c b/net/l3mdev/l3mdev.c > index 8da86ceca33d..309dee76724e 100644 > --- a/net/l3mdev/l3mdev.c > +++ b/net/l3mdev/l3mdev.c > @@ -46,6 +46,24 @@ int l3mdev_master_ifindex_rcu(const struct net_device *dev) > } > EXPORT_SYMBOL_GPL(l3mdev_master_ifindex_rcu); > > +/** > + * l3mdev_master_upper_ifindex_by_index - get index of upper l3 master > + * device > + * @net: network namespace for device index lookup > + * @ifindex: targeted interface > + */ > +int l3mdev_master_upper_ifindex_by_index_rcu(struct net *net, int ifindex) > +{ > + struct net_device *dev; > + > + dev = dev_get_by_index_rcu(net, ifindex); > + while (dev && !netif_is_l3_master(dev)) > + dev = netdev_master_upper_dev_get(dev); > + > + return dev ? dev->ifindex : 0; > +} > +EXPORT_SYMBOL_GPL(l3mdev_master_upper_ifindex_by_index_rcu); > + > /** > * l3mdev_fib_table - get FIB table id associated with an L3 > * master interface >