Tuesday, July 10, 2018 12:37 PM, Adrien Mazarguil: > Subject: Re: [PATCH v4 07/10] net/mlx5: probe all port representors > > On Mon, Jul 09, 2018 at 11:57:29AM +0000, Shahaf Shuler wrote: > > Hi Adrien, > > > > > > Thursday, July 5, 2018 11:46 AM, Adrien Mazarguil: > > > Subject: [PATCH v4 07/10] net/mlx5: probe all port representors > > > > > > Probe existing port representors in addition to their master device > > > and associate them automatically. > > > > > > To avoid collision between Ethernet devices, they are named as follows: > > > > > > - "{DBDF}" for master/switch devices. > > > - "{DBDF}_representor_{rep}" with "rep" starting from 0 for port > > > representors. > > > > > > (Patch based on prior work from Yuanhan Liu) > > > > > > Signed-off-by: Adrien Mazarguil <adrien.mazarg...@6wind.com> > > > Signed-off-by: Nelio Laranjeiro <nelio.laranje...@6wind.com> > > > Reviewed-by: Xueming Li <xuemi...@mellanox.com> > > > Cc: Xueming Li <xuemi...@mellanox.com> > > > Cc: Shahaf Shuler <shah...@mellanox.com> > > > -- > > > v4 changes: > > > > > > - Fixed domain ID release once the last port using it is closed. Closed > > > devices are not necessarily detached, their presence is not a good > > > indicator. Code was modified to check if they still use their domain IDs > > > before deciding to release it. > <snip> > > > @@ -883,6 +915,41 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev, > > > priv->nl_socket_rdma = mlx5_nl_init(0, NETLINK_RDMA); > > > priv->nl_socket_route = mlx5_nl_init(RTMGRP_LINK, > > > NETLINK_ROUTE); > > > priv->nl_sn = 0; > > > + priv->representor = !!switch_info->representor; > > > + priv->domain_id = RTE_ETH_DEV_SWITCH_DOMAIN_ID_INVALID; > > > + priv->representor_id = > > > + switch_info->representor ? switch_info->port_name : -1; > > > + /* > > > + * Look for sibling devices in order to reuse their switch domain > > > + * if any, otherwise allocate one. > > > + */ > > > + i = mlx5_dev_to_port_id(dpdk_dev, NULL, 0); > > > + if (i > 0) { > > > + uint16_t port_id[i]; > > > + > > > + i = RTE_MIN(mlx5_dev_to_port_id(dpdk_dev, port_id, i), i); > > > + while (i--) { > > > + const struct priv *opriv = > > > + rte_eth_devices[port_id[i]].data- > > > >dev_private; > > > + > > > + if (!opriv || > > > + opriv->domain_id == > > > + RTE_ETH_DEV_SWITCH_DOMAIN_ID_INVALID) > > > + continue; > > > + priv->domain_id = opriv->domain_id; > > > > It looks like for the second port it will use the domain_id of the first > > port. Is > that what you intent? > > Yes, it's on purpose. Master and representors of a given device must share > the same domain ID to let applications know they can create flow rules to > forward traffic between them all.
But this is not the case in Mellanox devices. On Mellanox devices each PF along w/ its representors has a separate eswitch, and traffic cannot be routed between the switches using flow rules. For example if we have PF0 along w/ its representor REP0_0 and PF1 along w/ its representor REP1_0 . PF0 and REP0_0 will belong to switch X and PF1 and REP1_0 will belong to switch domain Y. it is also being reflected on the phys_switch_id. We should have switch domain per PF. > > > Note - I couldn't test it due to compilation errors: > > > > > /.autodirect/swgwork/shahafs/workspace/dpdk.org/drivers/net/mlx5/mlx5 > _nl.c: In function 'mlx5_nl_switch_info_cb': > > > /.autodirect/swgwork/shahafs/workspace/dpdk.org/drivers/net/mlx5/mlx5 > _ > > nl.c:843:8: error: 'IFLA_PHYS_PORT_NAME' undecl ared (first use in this > function) > > case IFLA_PHYS_PORT_NAME: > > ^ > > > /.autodirect/swgwork/shahafs/workspace/dpdk.org/drivers/net/mlx5/mlx5 > _ > > nl.c:843:8: note: each undeclared identifier is reported only once > > for each function it appears in > > > /.autodirect/swgwork/shahafs/workspace/dpdk.org/drivers/net/mlx5/mlx5 > _ > > nl.c:851:8: error: 'IFLA_PHYS_SWITCH_ID' undecl ared (first use in this > function) > > case IFLA_PHYS_SWITCH_ID: > > ^ > > > > My system info: > > NAME="Red Hat Enterprise Linux Server" > > VERSION="7.3 (Maipo)" > > ID="rhel" > > ID_LIKE="fedora" > > VERSION_ID="7.3" > > PRETTY_NAME="Red Hat Enterprise Linux Server 7.3 (Maipo)" > > ANSI_COLOR="0;31" > > CPE_NAME="cpe:/o:redhat:enterprise_linux:7.3:GA:server" > > > HOME_URL="https://emea01.safelinks.protection.outlook.com/?url=https% > 3A%2F%2Fwww.redhat.com%2F&data=02%7C01%7Cshahafs%40mellan > ox.com%7C661e7b51087b460817c008d5e648bf1e%7Ca652971c7d2e4d9ba6a4 > d149256f461b%7C0%7C0%7C636668122474445351&sdata=Lg8arhiYLvH5L > 2hef8DVhS8A3fVJ%2B5IZkLIHmqCd%2FmY%3D&reserved=0" > > > BUG_REPORT_URL="https://emea01.safelinks.protection.outlook.com/?url= > https%3A%2F%2Fbugzilla.redhat.com%2F&data=02%7C01%7Cshahafs% > 40mellanox.com%7C661e7b51087b460817c008d5e648bf1e%7Ca652971c7d2e > 4d9ba6a4d149256f461b%7C0%7C0%7C636668122474445351&sdata=3Do > RKjxovM8tOgKLssC1mq2wwfhjpVUZSExXV4ywBEQ%3D&reserved=0" > > > > REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 7" > > REDHAT_BUGZILLA_PRODUCT_VERSION=7.3 > > REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux" > > REDHAT_SUPPORT_PRODUCT_VERSION="7.3" > > Red Hat Enterprise Linux Server release 7.3 (Maipo) Red Hat Enterprise > > Linux Server release 7.3 (Maipo) > > OK, I'll redefine in v5 in case they are missing on the host system. > > <snip> > > > diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h index > > > 704046270..cc01310e0 100644 > > > --- a/drivers/net/mlx5/mlx5.h > > > +++ b/drivers/net/mlx5/mlx5.h > > > @@ -159,6 +159,7 @@ struct priv { > > > struct ibv_context *ctx; /* Verbs context. */ > > > struct ibv_device_attr_ex device_attr; /* Device properties. */ > > > struct ibv_pd *pd; /* Protection Domain. */ > > > + char ibdev_name[IBV_SYSFS_NAME_MAX]; /* IB device name. */ > > > > > > Why we need a dedicated entry for the ibdev_name? it is already part of > priv->ctx->device->name. > > Heh, same reason as the next line below, don't forget those damn > secondaries which can't dereference local pointers from the primary process > :) Right 😊. > > > > char ibdev_path[IBV_SYSFS_PATH_MAX]; /* IB device path for > > > secondary */ > <snip> > > > struct rte_eth_dev_info *info) > > > info->speed_capa = priv->link_speed_capa; > > > info->flow_type_rss_offloads = ~MLX5_RSS_HF_MASK; > > > mlx5_set_default_params(dev, info); > > > + info->switch_info.name = dev->data->name; > > > + info->switch_info.domain_id = priv->domain_id; > > > + info->switch_info.port_id = priv->representor_id; > > > + if (priv->representor) { > > > + unsigned int i = mlx5_dev_to_port_id(dev->device, NULL, 0); > > > + uint16_t port_id[i]; > > > + > > > + i = RTE_MIN(mlx5_dev_to_port_id(dev->device, port_id, i), > > > i); > > > + while (i--) { > > > + struct priv *opriv = > > > + rte_eth_devices[port_id[i]].data- > > > >dev_private; > > > + > > > + if (!opriv || > > > + opriv->representor || > > > + opriv->domain_id != priv->domain_id) > > > + continue; > > > + /* > > > + * Override switch name with that of the master > > > + * device. > > > + */ > > > + info->switch_info.name = opriv->dev_data->name; > > > + break; > > > > According to this logic it means once the master device is closed, all the > representors are no longer belong to the same switch (switch name of each > is different) which is not correct. > > They still share the same domain ID, which is what actually matters. The > switch name is only provided to let applications identify the master > (control) device in case it's needed. > > > According to your notes it is possible to close master w/o closing the > representor. > > This allows devices to be probed in any order on a needed basis, not all at > once. It's done on purpose to pave the way for hotplug support. > > > Why not just storing the master switch name when probing the > representor and to use it as is on the dev_info? > > The switch name *must* be that of the master device. If the master is not > probed, there can't be a switch name. However there's no real provision for > this in the API, so I chose the most acceptable unique name, which is the > name of the local device. Would you prefer an empty name instead? The current approach is OK. I was just suggesting to skip the loop iteration by saving the switch name on the private structure. > > Thing is, on mlx5 flow rules can be created directly between representors > without involving the master device. An empty switch name may be > misleading in this respect. > > What do you suggest? > > -- > Adrien Mazarguil > 6WIND