> Subject: Re: [Patch v6 01/18] net/mana: add basic driver, build environment
> and doc
> 
> 
> 在 2022/9/7 9:36, Long Li 写道:
> >> Subject: Re: [Patch v6 01/18] net/mana: add basic driver, build
> >> environment and doc
> >>
> >>
> >> 在 2022/9/1 2:05, Long Li 写道:
> >>>> Subject: Re: [Patch v6 01/18] net/mana: add basic driver, build
> >>>> environment and doc
> >>>>
> >>>>
> >>>> 在 2022/8/31 6:51, lon...@linuxonhyperv.com 写道:
> >>>>> From: Long Li <lon...@microsoft.com>
> >>>>>
> >>>>> MANA is a PCI device. It uses IB verbs to access hardware through
> >>>>> the kernel RDMA layer. This patch introduces build environment and
> >>>>> basic device probe functions.
> >>>>>
> >>>>> Signed-off-by: Long Li <lon...@microsoft.com>
> >>>>> ---
> >>>>> Change log:
> >>>>> v2:
> >>>>> Fix typos.
> >>>>> Make the driver build only on x86-64 and Linux.
> >>>>> Remove unused header files.
> >>>>> Change port definition to uint16_t or uint8_t (for IB).
> >>>>> Use getline() in place of fgets() to read and truncate a line.
> >>>>> v3:
> >>>>> Add meson build check for required functions from RDMA direct verb
> >>>>> header file
> >>>>> v4:
> >>>>> Remove extra "\n" in logging code.
> >>>>> Use "r" in place of "rb" in fopen() to read text files.
> >>>>>
> >>>>> [snip]
> >>>>> +
> >>>>> +static int mana_pci_probe_mac(struct rte_pci_driver *pci_drv
> >>>> __rte_unused,
> >>>>> +                             struct rte_pci_device *pci_dev,
> >>>>> +                             struct rte_ether_addr *mac_addr) {
> >>>>> +       struct ibv_device **ibv_list;
> >>>>> +       int ibv_idx;
> >>>>> +       struct ibv_context *ctx;
> >>>>> +       struct ibv_device_attr_ex dev_attr;
> >>>>> +       int num_devices;
> >>>>> +       int ret = 0;
> >>>>> +       uint8_t port;
> >>>>> +       struct mana_priv *priv = NULL;
> >>>>> +       struct rte_eth_dev *eth_dev = NULL;
> >>>>> +       bool found_port;
> >>>>> +
> >>>>> +       ibv_list = ibv_get_device_list(&num_devices);
> >>>>> +       for (ibv_idx = 0; ibv_idx < num_devices; ibv_idx++) {
> >>>>> +               struct ibv_device *ibdev = ibv_list[ibv_idx];
> >>>>> +               struct rte_pci_addr pci_addr;
> >>>>> +
> >>>>> +               DRV_LOG(INFO, "Probe device name %s
> dev_name %s
> >>>> ibdev_path %s",
> >>>>> +                       ibdev->name, ibdev->dev_name, ibdev-
> >>>>> ibdev_path);
> >>>>> +
> >>>>> +               if (mana_ibv_device_to_pci_addr(ibdev, &pci_addr))
> >>>>> +                       continue;
> >>>>> +
> >>>>> +               /* Ignore if this IB device is not this PCI device */
> >>>>> +               if (pci_dev->addr.domain != pci_addr.domain ||
> >>>>> +                   pci_dev->addr.bus != pci_addr.bus ||
> >>>>> +                   pci_dev->addr.devid != pci_addr.devid ||
> >>>>> +                   pci_dev->addr.function != pci_addr.function)
> >>>>> +                       continue;
> >>>>> +
> >>>>> +               ctx = ibv_open_device(ibdev);
> >>>>> +               if (!ctx) {
> >>>>> +                       DRV_LOG(ERR, "Failed to open IB device %s",
> >>>>> +                               ibdev->name);
> >>>>> +                       continue;
> >>>>> +               }
> >>>>> +
> >>>>> +               ret = ibv_query_device_ex(ctx, NULL, &dev_attr);
> >>>>> +               DRV_LOG(INFO,
> "dev_attr.orig_attr.phys_port_cnt %u",
> >>>>> +                       dev_attr.orig_attr.phys_port_cnt);
> >>>>> +               found_port = false;
> >>>>> +
> >>>>> +               for (port = 1; port <=
> dev_attr.orig_attr.phys_port_cnt;
> >>>>> +                    port++) {
> >>>>> +                       struct ibv_parent_domain_init_attr attr = {};
> >>>>> +                       struct rte_ether_addr addr;
> >>>>> +                       char address[64];
> >>>>> +                       char name[RTE_ETH_NAME_MAX_LEN];
> >>>>> +
> >>>>> +                       ret = get_port_mac(ibdev, port, &addr);
> >>>>> +                       if (ret)
> >>>>> +                               continue;
> >>>>> +
> >>>>> +                       if (mac_addr
> && !rte_is_same_ether_addr(&addr,
> >>>> mac_addr))
> >>>>> +                               continue;
> >>>>> +
> >>>>> +                       rte_ether_format_addr(address,
> sizeof(address),
> >>>> &addr);
> >>>>> +                       DRV_LOG(INFO, "device located port %u
> address
> >>>> %s",
> >>>>> +                               port, address);
> >>>>> +                       found_port = true;
> >>>>> +
> >>>>> +                       priv = rte_zmalloc_socket(NULL, sizeof(*priv),
> >>>>> +
> RTE_CACHE_LINE_SIZE,
> >>>>> +                                                 SOCKET_ID_ANY);
> >>>>> +                       if (!priv) {
> >>>>> +                               ret = -ENOMEM;
> >>>>> +                               goto failed;
> >>>>> +                       }
> >>>>> +
> >>>>> +                       snprintf(name, sizeof(name), "%s_port%d",
> >>>>> +                                pci_dev->device.name, port);
> >>>>> +
> >>>>> +                       if (rte_eal_process_type() ==
> >>>> RTE_PROC_SECONDARY) {
> >>>>> +                               int fd;
> >>>>> +
> >>>>> +                               eth_dev =
> >>>> rte_eth_dev_attach_secondary(name);
> >>>>> +                               if (!eth_dev) {
> >>>>> +                                       DRV_LOG(ERR, "Can't attach
> to dev
> >>>> %s",
> >>>>> +                                               name);
> >>>>> +                                       ret = -ENOMEM;
> >>>>> +                                       goto failed;
> >>>>> +                               }
> >>>>> +
> >>>>> +                               eth_dev->device = &pci_dev->device;
> >>>>> +                               eth_dev->dev_ops =
> &mana_dev_sec_ops;
> >>>>> +                               ret = mana_proc_priv_init(eth_dev);
> >>>>> +                               if (ret)
> >>>>> +                                       goto failed;
> >>>>> +                               priv->process_priv = eth_dev-
> >>>>> process_private;
> >>>>> +
> >>>>> +                               /* Get the IB FD from the primary
> process */
> >>>>> +                               fd =
> >>>> mana_mp_req_verbs_cmd_fd(eth_dev);
> >>>>> +                               if (fd < 0) {
> >>>>> +                                       DRV_LOG(ERR, "Failed to get
> FD %d",
> >>>> fd);
> >>>>> +                                       ret = -ENODEV;
> >>>>> +                                       goto failed;
> >>>>> +                               }
> >>>>> +
> >>>>> +                               ret =
> >>>> mana_map_doorbell_secondary(eth_dev, fd);
> >>>>> +                               if (ret) {
> >>>>> +                                       DRV_LOG(ERR, "Failed
> secondary
> >>>> map %d",
> >>>>> +                                               fd);
> >>>>> +                                       goto failed;
> >>>>> +                               }
> >>>>> +
> >>>>> +                               /* fd is no not used after mapping
> doorbell */
> >>>>> +                               close(fd);
> >>>>> +
> >>>>> +
>       rte_spinlock_lock(&mana_shared_data-
> >>>>> lock);
> >>>>> +                               mana_shared_data-
> >secondary_cnt++;
> >>>>> +                               mana_local_data.secondary_cnt++;
> >>>>> +
>       rte_spinlock_unlock(&mana_shared_data-
> >>>>> lock);
> >>>>> +
> >>>>> +                               rte_eth_copy_pci_info(eth_dev,
> pci_dev);
> >>>>> +
>       rte_eth_dev_probing_finish(eth_dev);
> >>>>> +
> >>>>> +                               /* Impossible to have more than one
> port
> >>>>> +                                * matching a MAC address
> >>>>> +                                */
> >>>>> +                               continue;
> >>>>> +                       }
> >>>>> +
> >>>>> +                       eth_dev = rte_eth_dev_allocate(name);
> >>>>> +                       if (!eth_dev) {
> >>>>> +                               ret = -ENOMEM;
> >>>>> +                               goto failed;
> >>>>> +                       }
> >>>>> +
> >>>>> +                       eth_dev->data->mac_addrs =
> >>>>> +                               rte_calloc("mana_mac", 1,
> >>>>> +                                          sizeof(struct
> rte_ether_addr), 0);
> >>>>> +                       if (!eth_dev->data->mac_addrs) {
> >>>>> +                               ret = -ENOMEM;
> >>>>> +                               goto failed;
> >>>>> +                       }
> >>>>> +
> >>>>> +                       rte_ether_addr_copy(&addr, eth_dev-
> >data-
> >>>>> mac_addrs);
> >>>>> +
> >>>>> +                       priv->ib_pd = ibv_alloc_pd(ctx);
> >>>>> +                       if (!priv->ib_pd) {
> >>>>> +                               DRV_LOG(ERR, "ibv_alloc_pd failed
> port %d",
> >>>> port);
> >>>>> +                               ret = -ENOMEM;
> >>>>> +                               goto failed;
> >>>>> +                       }
> >>>>> +
> >>>>> +                       /* Create a parent domain with the port
> number */
> >>>>> +                       attr.pd = priv->ib_pd;
> >>>>> +                       attr.comp_mask =
> >>>> IBV_PARENT_DOMAIN_INIT_ATTR_PD_CONTEXT;
> >>>>> +                       attr.pd_context = (void *)(uint64_t)port;
> >>>>> +                       priv->ib_parent_pd =
> ibv_alloc_parent_domain(ctx,
> >>>> &attr);
> >>>>> +                       if (!priv->ib_parent_pd) {
> >>>>> +                               DRV_LOG(ERR,
> >>>>> +                                       "ibv_alloc_parent_domain
> failed port
> >>>> %d",
> >>>>> +                                       port);
> >>>>> +                               ret = -ENOMEM;
> >>>>> +                               goto failed;
> >>>>> +                       }
> >>>>> +
> >>>>> +                       priv->ib_ctx = ctx;
> >>>>> +                       priv->port_id = eth_dev->data->port_id;
> >>>>> +                       priv->dev_port = port;
> >>>>> +                       eth_dev->data->dev_private = priv;
> >>>>> +                       priv->dev_data = eth_dev->data;
> >>>>> +
> >>>>> +                       priv->max_rx_queues =
> dev_attr.orig_attr.max_qp;
> >>>>> +                       priv->max_tx_queues =
> dev_attr.orig_attr.max_qp;
> >>>>> +
> >>>>> +                       priv->max_rx_desc =
> >>>>> +
>       RTE_MIN(dev_attr.orig_attr.max_qp_wr,
> >>>>> +                                       dev_attr.orig_attr.max_cqe);
> >>>>> +                       priv->max_tx_desc =
> >>>>> +
>       RTE_MIN(dev_attr.orig_attr.max_qp_wr,
> >>>>> +                                       dev_attr.orig_attr.max_cqe);
> >>>>> +
> >>>>> +                       priv->max_send_sge =
> dev_attr.orig_attr.max_sge;
> >>>>> +                       priv->max_recv_sge =
> dev_attr.orig_attr.max_sge;
> >>>>> +
> >>>>> +                       priv->max_mr = dev_attr.orig_attr.max_mr;
> >>>>> +                       priv->max_mr_size =
> >>>> dev_attr.orig_attr.max_mr_size;
> >>>>> +
> >>>>> +                       DRV_LOG(INFO, "dev %s max queues %d
> desc %d
> >>>> sge %d",
> >>>>> +                               name, priv->max_rx_queues, priv-
> >>>>> max_rx_desc,
> >>>>> +                               priv->max_send_sge);
> >>>>> +
> >>>>> +                       rte_spinlock_lock(&mana_shared_data-
> >lock);
> >>>>> +                       mana_shared_data->primary_cnt++;
> >>>>> +                       rte_spinlock_unlock(&mana_shared_data-
> >lock);
> >>>>> +
> >>>>> +                       eth_dev->data->dev_flags |=
> >>>> RTE_ETH_DEV_INTR_RMV;
> >>>>> +
> >>>>> +                       eth_dev->device = &pci_dev->device;
> >>>>> +                       eth_dev->data->dev_flags |=
> >>>>> +
>       RTE_ETH_DEV_AUTOFILL_QUEUE_XSTATS;
> >>>>> +
> >>>> Please do not use the temporary macro. Please review this patch:
> >>>>
> >>>> f30e69b41f94 ("ethdev: add device flag to bypass auto-filled queue
> >>>> xstats")
> >>>>
> >>>> This patch requires that per queue statistics are filled in
> >>>> .xstats_get() by PMD.
> >>> Thanks for pointing this out.
> >>>
> >>> It seems some PMDs are still depending on this flag for xstats.
> >>>
> >>> MANA doesn't implement xstats_get() currently, this flag is useful.
> >>> Is it
> >> okay to keep using this flag before it's finally the time to remove
> >> it from all PMDs, or when MANA implements xstats?
> >> Yes, your xstats doesn't implement now. Per queue stats should be
> >> filled in xstats API, and the stats API cannot see per queue stats,
> >> so stats API in driver shouldn't fill it(suggest that delete it from patch
> 17/18).
> >>
> >> I guess this flag can be removed if PMD does not support xstats.
> >>>
> > I don't understand your suggestion. An application can call
> rte_eth_stats_get() to get port stats, and this will call into stats_get() in 
> the
> driver, as implemented in patch 17/18.
> >
> > When flag RTE_ETH_DEV_AUTOFILL_QUEUE_XSTATS  is set, an application
> can also use rte_eth_xstats_get() to get port stats even the driver doesn't
> implement xstats_get().
> 
> I think new PMD should follow the announced switch Ferruh mentioned,
> otherwise, the switch will never be completed.

I see now I understand more. I prefer to keep patch 17/18, or an application 
calling rte_eth_stats_get() will fail. (before the switch happens)

> 
> Suggest that mana driver can implement a simple xstats_get() to fill per
> queue stats if you want to support per queue stats.
> 
> @Ferruh, what do you think?

Reply via email to