Under relatively recent kernels (v4.4+), creating a vlan subport on a LRO supported parent NIC may turn LRO off on the parent port and further render its LRO feature practically unchangeable.
This can be easily reproduced on different distros, and independent of NIC vendors. Hopefully, this is not a repeat post of a known issue. Below example is on Ubuntu 18.04 LTS. (Centos-7.6 is slightly different, but the end result is the same, will attach in the end) =========================================================================== # Ubuntu 18.04 LTS root@server1:# uname -a Linux server1 4.15.0-20-generic #21-Ubuntu SMP Tue Apr 24 06:16:15 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux # mellanox NIC root@server1:# /sbin/ethtool -i ens4f0 driver: mlx5_core version: 5.0-2.1.8 # enable LRO on the NIC root@server1:# /sbin/ethtool -k ens4f0 | grep large large-receive-offload: off root@server1:# /sbin/ethtool -K ens4f0 lro on root@server1:# /sbin/ethtool -k ens4f0 | grep large large-receive-offload: on # create a vlan subport, once subport is up, parent port LRO is disabled root@server1:# ip link add link ens4f0 name ens4f0.50 type vlan id 50 root@server1:# ifconfig ens4f0.50 up root@server1:# ethtool -k ens4f0.50 | grep large large-receive-offload: off [fixed] root@server1:# ethtool -k ens4f0 | grep large large-receive-offload: off # manually enabling LRO on parent port not working any more root@server1:# /sbin/ethtool -K ens4f0 lro on Could not change any device features root@server1:# /sbin/ethtool -K ens4f0.50 lro on Cannot change large-receive-offload Could not change any device features root@server1:# /sbin/ethtool -K ens4f0 lro on Could not change any device features root@server1:# ethtool -k ens4f0 | grep large large-receive-offload: off [requested on] # Now the only way to re-enable LRO on the parent port is to remove the subport root@server1:# ip link del ens4f0.50 root@server1:# /sbin/ethtool -k ens4f0 | grep large large-receive-offload: off [requested on] root@server1:# /sbin/ethtool -K ens4f0 lro on root@server1:# ethtool -k ens4f0 | grep large large-receive-offload: on =========================================================================== Although LRO may have different implications or issues in practice, this seems a simple use case expected to work?--enabling LRO on the physical NIC and also having vlans on the same NIC port. Note, here both the parent port and the vlan subport are not attached to any bridge, bond, team or ovs devices, just standalone. This issue seems not driver or distro related, and lies in the kernel network stack. When changing netdev features, (via either userspace ethtool, or other in-kernel processing), in the end: __netdev_update_features() does the job and calls netdev_sync_upper_features() and netdev_sync_lower_features() both sync functions basically do one thing: make sure NETIF_F_UPPER_DISABLES is consistently enforced among upper and lower net devices. currently NETIF_F_UPPER_DISABLES only includes NETIF_F_LRO A lot of thoughts must have been given to this logic, and many situations are considered for upper_devs like bond, team, bridge etc. However, maybe a possible oversight is vlan_dev, which is an upper_dev for its parent real_dev? A vlan_dev is created with LRO unsupported by default, (NETIF_F_LRO bit not set in hw_features). As seen "fixed" in root@server1:# ethtool -k ens4f0.50 | grep large large-receive-offload: off [fixed] Therefore, following the code path of upper_sync and lower_sync above, once a vlan_dev is created, the parent real_dev can no longer set LRO on. Honestly, vlan_dev being treated as an upper_dev for the real_dev is a bit counter-intuitive at the beginning, as people call them vlan subports. But, from the perspective that vlan_dev is a virtual device created out of real_dev, it has somewhat "upper_dev" flavor, similar to bond/team devices. Kernel also associates upper_dev with some "master" role, and it makes perfect sense for bond/team/bridge/ovs. However, for vlan_dev, it sounds more like a slave dev to real_dev (some people call real_dev parent port). A secondary point, upper_dev (bond/team/bridge) typically has > 1 lower_dev, upper:lower normally has 1:N relationship. For vlan_dev, it has only 1 lower_dev, upper:lower could often be N:1 relationship. The above upper/lower sync logic probably stems from the "master" role aspect of upper_dev, just that vlan_dev may not be a good fit for this. Probably that is where the confusion is. Maybe I missed something, but this logic has been there for quite some time (since v4.4 onwards, didn't try the latest, but tried pre-v4.4 kernels, no such issue under older kernels though). Feel free to correct me. Now, two possible solution proposals to fix this (if considered as an issue) 1. when creating/init a vlan_dev, set its hw_feature's NETIF_F_LRO bit based on its underlying real_dev's hw_feature NETIF_F_LRO bit. (maybe not just hw_features, set wanted_feature as well?) 2. in netdev_sync_upper_features() and netdev_sync_lower_features() exclude those upper_dev that is also a vlan_dev Thanks for the attention. Limin p.s. another example of Centos-7.6 with VMXNET3 port =========================================================================== # CentOS Linux release 7.6.1810 (Core) root@esxi-server]# uname -a Linux esxi-server 3.10.0-957.27.2.el7.x86_64 #1 SMP Mon Jul 29 17:46:05 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux # VMXNET3 NIC [root@esxi-server]# ethtool -i ens224 driver: vmxnet3 version: 1.4.14.0-k-NAPI # LRO enabled on the NIC [root@esxi-server]# ethtool -k ens224 | grep large large-receive-offload: on # create a vlan subport, NIC LRO still on [root@esxi-server]# ip link add link ens224 name ens224.50 type vlan id 50 [root@esxi-server]# ifconfig ens224.50 up [root@esxi-server]# ethtool -k ens224 | grep large large-receive-offload: on [root@esxi-server]# ethtool -k ens224.50 | grep large large-receive-offload: off [fixed] # now turn LRO off, and after that, LRO cannot be turned on any longer [root@esxi-server]# ethtool -K ens224 lro off [root@esxi-server]# ethtool -k ens224 | grep large large-receive-offload: off [root@esxi-server]# ethtool -k ens224.50 | grep large large-receive-offload: off [fixed] [root@esxi-server]# ethtool -K ens224 lro on Could not change any device features [root@esxi-server]# ethtool -k ens224 | grep large large-receive-offload: off [requested on] [root@esxi-server]# ethtool -k ens224.50 | grep large large-receive-offload: off [fixed] # Now the only way to re-enable LRO on the parent port is to remove the subport [root@esxi-server]# ip link del ens224.50 [root@esxi-server]# ethtool -k ens224 | grep large large-receive-offload: off [requested on] [root@esxi-server]# ethtool -K ens224 lro on [root@esxi-server]# ethtool -k ens224 | grep large large-receive-offload: on ===========================================================================