30/10/2019 17:09, Ilya Maximets: > On 30.10.2019 16:49, Thomas Monjalon wrote: > > 30/10/2019 16:07, Ilya Maximets: > >> On 29.10.2019 19:50, Thomas Monjalon wrote: > >>> In a virtual environment, the network controller may have to configure > >>> some SR-IOV VF parameters for security reasons. > >>> > >>> When the PF (host port) is driven by DPDK (OVS-DPDK case), > >>> we face two different cases: > >>> - driver is bifurcated (Mellanox case), > >>> so the VF can be configured via the kernel. > >>> - driver is on top of UIO or VFIO, so DPDK API is required, > >>> and PMD-specific APIs were used. > >> > >> So, what is wrong with setting VF mac via the representor port as > >> it done now? From our previous discussion, I thought that we concluded > >> that is enough to have current API, i.e. just call set_default_mac() > >> for a representor port and this will lead to setting mac for VF. > >> This is how it's implemented in Linux kernel and this is how it's > >> implemented in current DPDK drivers that supports setting mac for > >> the representor. > > > > I don't know what is done in the Linux kernel. > > In DPDK, setting the MAC of the representor is really setting > > the MAC of the representor. Is it crazy? > > Kind of. And no, it doesn't work this way in DPDK. > Just look at the i40e driver: > > 325 static int > 326 i40e_vf_representor_mac_addr_set(struct rte_eth_dev *ethdev, > 327 struct ether_addr *mac_addr) > 328 { > 329 struct i40e_vf_representor *representor = ethdev->data->dev_private; > 330 > 331 return rte_pmd_i40e_set_vf_mac_addr( > 332 representor->adapter->eth_dev->data->port_id, > 333 representor->vf_id, mac_addr); > 334 } > .... > 423 static const struct eth_dev_ops i40e_representor_dev_ops = { > <...> > 445 .mac_addr_set = i40e_vf_representor_mac_addr_set, > > > It really calls the function to set VF mac address.
Indeed, I missed that i40e_vf_representor_mac_addr_set() is calling rte_pmd_i40e_set_vf_mac_addr(). > And for MLX drivers it's the same. > MLX driver call netlink to set representor MAC, but netlink is in > kernel and this call inside the kernel translates to the setting > of mac address of the VF. > > Am I missing something? I am not sure about this kernel translation. At least not in mlx5. Setting MAC address of a VF representor seems not supported on mlx5. But there is a specific netlink request to set a VF MAC address: ip link set <PF> vf <VF> mac <MAC> > >> The only use case for this new API is to be able to control mac of > >> the representor itself, which doesn't make much sense. At least there > >> are only hypothetical use cases. And once again, Linux kernel doesn't > >> support this behavior. > > > > I think it is sane to be able to set different MAC addresses > > for the representor and the VF. Let me explain better my thoughts. In the switchdev design, VF and uplink ports are all connected together via a switch. The representors are mirrors of those switch ports. So a VF representor port is supposed to mirror the switch port where a VF is connected to. It is different of the VF itself. This is a drawing of my understanding of switchdev design: VF1 ------ VF1 rep /--------\ | switch | uplink rep ------ uplink ------ wire VF2 ------ VF2 rep \--------/ (PF) Of course, there can be more VF and uplink ports. With some switch-aware protocols, it might be interesting to have different MAC addresses on a VF and its representor. And more generally, I imagine that the config of a VF representor could be different of the config of a VF. > >>> This new generic API will avoid vendors fragmentation. > >> > >> I don't see the fragmentation. Right now you can set VF mac from DPDK > >> by calling set_default_mac() for representor. This API exists for all > >> vendors. Not implemented for Intel, but new API is not implemented too. > > > > No, the current situation is the following: > > - for mlx5, VF MAC can be configured with iproute2 or netlink > > - for ixgbe, rte_pmd_ixgbe_set_vf_mac_addr() > > - for i40e, rte_pmd_i40e_set_vf_mac_addr() > > - for bnxt, rte_pmd_bnxt_set_vf_mac_addr() Thanks to Ilya's comment, this is an update of the DPDK situation: - for mlx5, VF MAC can be configured with iproute2 or netlink and rte_eth_dev_default_mac_addr_set(rep) is not supported - for ixgbe, rte_pmd_ixgbe_set_vf_mac_addr() and rte_eth_dev_default_mac_addr_set(rep) does the same - for i40e, rte_pmd_i40e_set_vf_mac_addr() and rte_eth_dev_default_mac_addr_set(rep) does the same - for bnxt, rte_pmd_bnxt_set_vf_mac_addr() and no representor If we consider what Intel did, i.e. configure VF in place of representor for some operations, there are two drawbacks: - confusing that some ops apply to representor, others apply to VF - some ops are not possible on representor (because targetted to VF) I still feel that the addition of one single bit in the port ID is an elegant solution to target either the VF or its representor. > >> The this is that this new API will produce conceptual fragmentation > >> between DPDK and the Linux kernel, because to do the same thing you'll > >> have to use different ways. I mean, to change mac of VF in kernel you > >> need to set mac to the representor, but in DPDK changing setting mac to > >> representor will lead to changing the mac of the representor itself, > >> not the VF. This will be really confusing for users. > > > > I am not responsible of the choices in Linux. > > But I agree it would be interesting to check the reason of such decision. > > Rony, please could you explain? I looked at few Linux drivers: bnxt_vf_rep_netdev_ops has no op to set MAC bnxt_netdev_ops.ndo_set_vf_mac = set VF MAC from PF lio_vf_rep_ndev_ops has no op to set MAC lionetdevops.ndo_set_vf_mac = set VF MAC from PF mlx5e_netdev_ops_rep has no op to set MAC mlx5e_netdev_ops.ndo_set_vf_mac = set VF MAC from PF mlx5e_netdev_ops_uplink_rep.ndo_set_vf_mac = set VF MAC from PF nfp_repr_netdev_ops.ndo_set_mac_address = set representor MAC nfp_repr_netdev_ops.ndo_set_vf_mac = set VF MAC from representor nfp_net_netdev_ops.ndo_set_vf_mac = set VF MAC from PF There is a big chance that the behaviour is not standardized in Linux (as usual). So it is already confusing for users of Linux.