Hi, On Tue, 2019-06-25 at 14:42 +0800, Zhiyuan Hou wrote: > In ipvlan l3s mode, ingress packet is switched to slave interface and > delivers to l4 stack. This may cause two problems: > > 1. When slave is in an ns different from master, the behavior of stack > in slave ns may cause confusion for users. For example, iptables, tc, > and other l2/l3 functions are not available for ingress packet. > > 2. l3s mode is not used for tap device, and cannot support ipvtap. But > in VM or container based VM cases, tap device is a very common device. > > In l3s mode's input nf_hook, this patch calles the skb_forward_dev() to > forward ingress packet to slave and uses nf_conntrack_confirm() to make > conntrack work with new mode. > > Signed-off-by: Zha Bin <zha...@linux.alibaba.com> > Signed-off-by: Zhiyuan Hou <zhiyuan2...@linux.alibaba.com> > --- > drivers/net/ipvlan/ipvlan.h | 9 ++++++++- > drivers/net/ipvlan/ipvlan_l3s.c | 16 ++++++++++++++-- > 2 files changed, 22 insertions(+), 3 deletions(-) > > diff --git a/drivers/net/ipvlan/ipvlan.h b/drivers/net/ipvlan/ipvlan.h > index 3837c897832e..48c814e24c3f 100644 > --- a/drivers/net/ipvlan/ipvlan.h > +++ b/drivers/net/ipvlan/ipvlan.h > @@ -172,6 +172,14 @@ void ipvlan_link_delete(struct net_device *dev, struct > list_head *head); > void ipvlan_link_setup(struct net_device *dev); > int ipvlan_link_register(struct rtnl_link_ops *ops); > #ifdef CONFIG_IPVLAN_L3S > + > +#include <net/netfilter/nf_conntrack_core.h> > + > +static inline int ipvlan_confirm_conntrack(struct sk_buff *skb) > +{ > + return nf_conntrack_confirm(skb); > +} > + > int ipvlan_l3s_register(struct ipvl_port *port); > void ipvlan_l3s_unregister(struct ipvl_port *port); > void ipvlan_migrate_l3s_hook(struct net *oldnet, struct net *newnet); > @@ -206,5 +214,4 @@ static inline bool netif_is_ipvlan_port(const struct > net_device *dev) > { > return rcu_access_pointer(dev->rx_handler) == ipvlan_handle_frame; > } > - > #endif /* __IPVLAN_H */ > diff --git a/drivers/net/ipvlan/ipvlan_l3s.c b/drivers/net/ipvlan/ipvlan_l3s.c > index 943d26cbf39f..ed210002f593 100644 > --- a/drivers/net/ipvlan/ipvlan_l3s.c > +++ b/drivers/net/ipvlan/ipvlan_l3s.c > @@ -95,14 +95,26 @@ static unsigned int ipvlan_nf_input(void *priv, struct > sk_buff *skb, > { > struct ipvl_addr *addr; > unsigned int len; > + int ret = NF_ACCEPT; > + bool success; > > addr = ipvlan_skb_to_addr(skb, skb->dev); > if (!addr) > goto out; > > - skb->dev = addr->master->dev; > len = skb->len + ETH_HLEN; > - ipvlan_count_rx(addr->master, len, true, false); > + > + ret = ipvlan_confirm_conntrack(skb); > + if (ret != NF_ACCEPT) { > + ipvlan_count_rx(addr->master, len, false, false); > + goto out; > + } > + > + skb_push_rcsum(skb, ETH_HLEN); > + success = dev_forward_skb(addr->master->dev, skb) == NET_RX_SUCCESS;
This looks weird to me: if I read the code correctly, the skb will traverse twice NF_INET_LOCAL_IN, once due to the l3s hooking and another one due to dev_forward_skb(). Also, tc ingreess, etc will run after the first traversing of NF_INET_LOCAL_IN. All in all I think that if full l2 processing is required, a different mode or a different virtual device should be used. Cheers, Paolo