On Tue, Jan 9, 2018 at 7:12 PM, liuqifa <liuq...@huawei.com> wrote: > >> On Mon, Jan 8, 2018 at 10:48 PM, <liuq...@huawei.com> wrote: >> > From: Keefe Liu <liuq...@huawei.com> >> > >> > The MTU of ipvlan interface should not bigger than the phy device, >> > When we run following scripts, we will find there are some problems. >> > Step1: >> > ip link add link eth0 name ipv1 type ipvlan mode l2 >> > ip netns add net1 >> > ip link set dev ipv1 netns net1 >> > Step2: >> > ip netns exec net1 ip link set dev ipv1 mtu 1501 >> > RTNETLINK answers: Invalid argument >> > dmesg info: "ipv1: Invalid MTU 1501 requested, hw max 1500" >> > Step3: >> > ip link set dev eth0 mtu 1600 >> > ip netns exec net1 ip link set dev ipv1 mtu 1501 >> > RTNETLINK answers: Invalid argument >> > dmesg info: "ipv1: Invalid MTU 1501 requested, hw max 1500" >> > Step4: >> > ip link set dev eth0 mtu 1400 >> > ip netns exec net1 ip link set dev ipv1 mtu 1500 The result of >> > Step2 is we expected, but the result of Step3 and Step4 are not. >> > >> > This patch set ipvlan's maximum MTU to ETH_MAX_MTU, and when we >> change >> > the ipvlan device's MTU, ipvlan_change_mtu() will make sure the new >> > MTU no larger than the phy device's MTU. >> > >> > Signed-off-by: Keefe Liu <liuq...@huawei.com> >> > --- >> > drivers/net/ipvlan/ipvlan_main.c | 14 ++++++++++++++ >> > 1 file changed, 14 insertions(+) >> > >> > diff --git a/drivers/net/ipvlan/ipvlan_main.c >> > b/drivers/net/ipvlan/ipvlan_main.c >> > index 30cb803..84c007d 100644 >> > --- a/drivers/net/ipvlan/ipvlan_main.c >> > +++ b/drivers/net/ipvlan/ipvlan_main.c >> > @@ -380,12 +380,24 @@ static int ipvlan_get_iflink(const struct net_device >> *dev) >> > return ipvlan->phy_dev->ifindex; } >> > >> > +static int ipvlan_change_mtu(struct net_device *dev, int new_mtu) { >> > + struct ipvl_dev *ipvlan = netdev_priv(dev); >> > + >> > + if (ipvlan->phy_dev->mtu < new_mtu) >> > + return -EINVAL; >> > + >> > + dev->mtu = new_mtu; >> > + return 0; >> > +} >> > + >> > static const struct net_device_ops ipvlan_netdev_ops = { >> > .ndo_init = ipvlan_init, >> > .ndo_uninit = ipvlan_uninit, >> > .ndo_open = ipvlan_open, >> > .ndo_stop = ipvlan_stop, >> > .ndo_start_xmit = ipvlan_start_xmit, >> > + .ndo_change_mtu = ipvlan_change_mtu, >> > .ndo_fix_features = ipvlan_fix_features, >> > .ndo_change_rx_flags = ipvlan_change_rx_flags, >> > .ndo_set_rx_mode = ipvlan_set_multicast_mac_filter, >> > @@ -680,6 +692,8 @@ void ipvlan_link_setup(struct net_device *dev) { >> > ether_setup(dev); >> > >> > + dev->min_mtu = 0; >> should be ETH_MIN_MTU since we expect the underlying device to be >> ETH_ARPHDR and IPvlan deals with IPv4/IPv6. >> > [liuqifa] Yes, I agree with you. But I have another question, I found in > commit 9157208, macvlan's min_mtu has been changed from ETH_MIN_MTU to 0, > I'd like to know > what's the difference between macvlan and ipvlan.
IPvlan uses L3 address for its demux logic and min-L3-mtu set to be ETH_MIN_MTU (68) for IPv4 packets with ip-options which no one uses them now. Practically for IPvlan to operate, the header need be = ether_hdr (14) + max(IPv4 hdr, IPv6 hdr) 40 = 54. So MTU which includes HDRs + payload need to be more than 54. Same principle applies to Macvlan except that its demux logic uses L2 instead of L3 (as in case of IPvlan). I don't know the motive behind setting it to 0 in macvlan. >> > + dev->max_mtu = ETH_MAX_MTU; >> > dev->priv_flags &= ~(IFF_XMIT_DST_RELEASE | IFF_TX_SKB_SHARING); >> > dev->priv_flags |= IFF_UNICAST_FLT | IFF_NO_QUEUE; >> > dev->netdev_ops = &ipvlan_netdev_ops; >> > -- >> > 1.8.3.1 >> > >> > >> These changes are not sufficient if you want to have different per-slave mtu >> settings. One can always change the MTU of the master device and all per- >> slave settings will get wiped. I don't think that's a desired outcome. > [liuqifa] I agree with you, I will reprogram the code to realize this goal. I suggest you use the 'mtu_adj' per slave device that I had in the code for the exact same reasons that you are implementing. I never got around completing it (it was deleted in later updates), but you can re-add that field. So when when ndo_change_mtu is invoked for a slave, you recalculate 'mtu_adj' field and when ndo_change_mtu is called for master, you adjust this field for all of the slaves. This will make ndo_change_mtu work on both slave as well master without causing issues.