On 30.10.2019 22:42, Thomas Monjalon wrote:
30/10/2019 17:09, Ilya Maximets:
On 30.10.2019 16:49, Thomas Monjalon wrote:
30/10/2019 16:07, Ilya Maximets:
On 29.10.2019 19:50, Thomas Monjalon wrote:
In a virtual environment, the network controller may have to configure
some SR-IOV VF parameters for security reasons.

When the PF (host port) is driven by DPDK (OVS-DPDK case),
we face two different cases:
       - driver is bifurcated (Mellanox case),
         so the VF can be configured via the kernel.
       - driver is on top of UIO or VFIO, so DPDK API is required,
         and PMD-specific APIs were used.

So, what is wrong with setting VF mac via the representor port as
it done now?  From our previous discussion, I thought that we concluded
that is enough to have current API, i.e. just call set_default_mac()
for a representor port and this will lead to setting mac for VF.
This is how it's implemented in Linux kernel and this is how it's
implemented in current DPDK drivers that supports setting mac for
the representor.

I don't know what is done in the Linux kernel.
In DPDK, setting the MAC of the representor is really setting
the MAC of the representor. Is it crazy?

Kind of. And no, it doesn't work this way in DPDK.
Just look at the i40e driver:

325 static int
326 i40e_vf_representor_mac_addr_set(struct rte_eth_dev *ethdev,
327         struct ether_addr *mac_addr)
328 {
329     struct i40e_vf_representor *representor = ethdev->data->dev_private;
330
331     return rte_pmd_i40e_set_vf_mac_addr(
332         representor->adapter->eth_dev->data->port_id,
333         representor->vf_id, mac_addr);
334 }
....
423 static const struct eth_dev_ops i40e_representor_dev_ops = {
      <...>
445     .mac_addr_set         = i40e_vf_representor_mac_addr_set,


It really calls the function to set VF mac address.

Indeed, I missed that i40e_vf_representor_mac_addr_set() is calling
rte_pmd_i40e_set_vf_mac_addr().

And for MLX drivers it's the same.
MLX driver call netlink to set representor MAC, but netlink is in
kernel and this call inside the kernel translates to the setting
of mac address of the VF.

Am I missing something?

I am not sure about this kernel translation. At least not in mlx5.
Setting MAC address of a VF representor seems not supported on mlx5.
But there is a specific netlink request to set a VF MAC address:
        ip link set <PF> vf <VF> mac <MAC>


The only use case for this new API is to be able to control mac of
the representor itself, which doesn't make much sense. At least there
are only hypothetical use cases. And once again, Linux kernel doesn't
support this behavior.

I think it is sane to be able to set different MAC addresses
for the representor and the VF.

Let me explain better my thoughts.
In the switchdev design, VF and uplink ports are all connected
together via a switch.
The representors are mirrors of those switch ports.
So a VF representor port is supposed to mirror the switch port
where a VF is connected to. It is different of the VF itself.

This is a drawing of my understanding of switchdev design:

VF1 ------ VF1 rep /--------\
                    | switch | uplink rep ------ uplink ------ wire
VF2 ------ VF2 rep \--------/    (PF)

Of course, there can be more VF and uplink ports.

With some switch-aware protocols, it might be interesting to have
different MAC addresses on a VF and its representor.
And more generally, I imagine that the config of a VF representor
could be different of the config of a VF.


This new generic API will avoid vendors fragmentation.

I don't see the fragmentation. Right now you can set VF mac from DPDK
by calling set_default_mac() for representor.  This API exists for all
vendors. Not implemented for Intel, but new API is not implemented too.

No, the current situation is the following:
        - for mlx5, VF MAC can be configured with iproute2 or netlink
        - for ixgbe, rte_pmd_ixgbe_set_vf_mac_addr()
        - for i40e, rte_pmd_i40e_set_vf_mac_addr()
        - for bnxt, rte_pmd_bnxt_set_vf_mac_addr()

Thanks to Ilya's comment, this is an update of the DPDK situation:
        - for mlx5, VF MAC can be configured with iproute2 or netlink
          and rte_eth_dev_default_mac_addr_set(rep) is not supported
        - for ixgbe, rte_pmd_ixgbe_set_vf_mac_addr()
          and rte_eth_dev_default_mac_addr_set(rep) does the same
        - for i40e, rte_pmd_i40e_set_vf_mac_addr()
          and rte_eth_dev_default_mac_addr_set(rep) does the same
        - for bnxt, rte_pmd_bnxt_set_vf_mac_addr()
          and no representor

If we consider what Intel did, i.e. configure VF in place of representor
for some operations, there are two drawbacks:
- confusing that some ops apply to representor, others apply to VF
- some ops are not possible on representor (because targetted to VF)

I still feel that the addition of one single bit in the port ID
is an elegant solution to target either the VF or its representor.

Since we already have a confusion about what is configured when
operations are performed on a representor port we have 2 options:
1. Have this proposed API to configure representor itself while
   setting config to representor and configuring VF if special
   bit enabled.
2. Reverse the logic of current proposal, i.e. always apply
   configuration to VF while working with representor and apply
   configuration to representor itself if special bit is set.

I'd probably prefer option #2, because:
- From the OVS and OpenStack point of view, I think, we don't
  really need to configure representor itself in most cases.
  And OVS really should not know if it works with representor
  or some real port.
- It seems that most of the existing code in DPDK already works
  like this, i.e. applying configs to VF itself.  Intel drivers
  works like this and  Mellanox drivers, as Thomas said, doesn't
  have this functionality at all.



The this is that this new API will produce conceptual fragmentation
between DPDK and the Linux kernel, because to do the same thing you'll
have to use different ways. I mean, to change mac of VF in kernel you
need to set mac to the representor, but in DPDK changing setting mac to
representor will lead to changing the mac of the representor itself,
not the VF. This will be really confusing for users.

I am not responsible of the choices in Linux.
But I agree it would be interesting to check the reason of such decision.
Rony, please could you explain?

I looked at few Linux drivers:

        bnxt_vf_rep_netdev_ops has no op to set MAC
        bnxt_netdev_ops.ndo_set_vf_mac = set VF MAC from PF

        lio_vf_rep_ndev_ops has no op to set MAC
        lionetdevops.ndo_set_vf_mac = set VF MAC from PF

        mlx5e_netdev_ops_rep has no op to set MAC
        mlx5e_netdev_ops.ndo_set_vf_mac = set VF MAC from PF
        mlx5e_netdev_ops_uplink_rep.ndo_set_vf_mac = set VF MAC from PF

        nfp_repr_netdev_ops.ndo_set_mac_address = set representor MAC
        nfp_repr_netdev_ops.ndo_set_vf_mac = set VF MAC from representor
        nfp_net_netdev_ops.ndo_set_vf_mac = set VF MAC from PF

There is a big chance that the behaviour is not standardized in Linux
(as usual). So it is already confusing for users of Linux.


Reply via email to