> Subject: Re: [Patch v6 01/18] net/mana: add basic driver, build environment
> and doc
> 
> On 9/7/2022 3:16 AM, lihuisong (C) wrote:
> > CAUTION: This message has originated from an External Source. Please
> > use proper judgment and caution when opening attachments, clicking
> > links, or responding to this email.
> >
> >
> > 在 2022/9/7 9:36, Long Li 写道:
> >>> Subject: Re: [Patch v6 01/18] net/mana: add basic driver, build
> >>> environment and doc
> >>>
> >>>
> >>> 在 2022/9/1 2:05, Long Li 写道:
> >>>>> Subject: Re: [Patch v6 01/18] net/mana: add basic driver, build
> >>>>> environment and doc
> >>>>>
> >>>>>
> >>>>> 在 2022/8/31 6:51, lon...@linuxonhyperv.com 写道:
> >>>>>> From: Long Li <lon...@microsoft.com>
> >>>>>>
> >>>>>> MANA is a PCI device. It uses IB verbs to access hardware through
> >>>>>> the kernel RDMA layer. This patch introduces build environment and
> >>>>>> basic device probe functions.
> >>>>>>
> >>>>>> Signed-off-by: Long Li <lon...@microsoft.com>
> >>>>>> ---
> >>>>>> Change log:
> >>>>>> v2:
> >>>>>> Fix typos.
> >>>>>> Make the driver build only on x86-64 and Linux.
> >>>>>> Remove unused header files.
> >>>>>> Change port definition to uint16_t or uint8_t (for IB).
> >>>>>> Use getline() in place of fgets() to read and truncate a line.
> >>>>>> v3:
> >>>>>> Add meson build check for required functions from RDMA direct
> verb
> >>>>>> header file
> >>>>>> v4:
> >>>>>> Remove extra "\n" in logging code.
> >>>>>> Use "r" in place of "rb" in fopen() to read text files.
> >>>>>>
> >>>>>> [snip]
> >>>>>> +
> >>>>>> +static int mana_pci_probe_mac(struct rte_pci_driver *pci_drv
> >>>>> __rte_unused,
> >>>>>> +                       struct rte_pci_device *pci_dev,
> >>>>>> +                       struct rte_ether_addr *mac_addr) {
> >>>>>> + struct ibv_device **ibv_list;
> >>>>>> + int ibv_idx;
> >>>>>> + struct ibv_context *ctx;
> >>>>>> + struct ibv_device_attr_ex dev_attr;
> >>>>>> + int num_devices;
> >>>>>> + int ret = 0;
> >>>>>> + uint8_t port;
> >>>>>> + struct mana_priv *priv = NULL;
> >>>>>> + struct rte_eth_dev *eth_dev = NULL;
> >>>>>> + bool found_port;
> >>>>>> +
> >>>>>> + ibv_list = ibv_get_device_list(&num_devices);
> >>>>>> + for (ibv_idx = 0; ibv_idx < num_devices; ibv_idx++) {
> >>>>>> +         struct ibv_device *ibdev = ibv_list[ibv_idx];
> >>>>>> +         struct rte_pci_addr pci_addr;
> >>>>>> +
> >>>>>> +         DRV_LOG(INFO, "Probe device name %s dev_name %s
> >>>>> ibdev_path %s",
> >>>>>> +                 ibdev->name, ibdev->dev_name, ibdev-
> >>>>>> ibdev_path);
> >>>>>> +
> >>>>>> +         if (mana_ibv_device_to_pci_addr(ibdev, &pci_addr))
> >>>>>> +                 continue;
> >>>>>> +
> >>>>>> +         /* Ignore if this IB device is not this PCI device */
> >>>>>> +         if (pci_dev->addr.domain != pci_addr.domain ||
> >>>>>> +             pci_dev->addr.bus != pci_addr.bus ||
> >>>>>> +             pci_dev->addr.devid != pci_addr.devid ||
> >>>>>> +             pci_dev->addr.function != pci_addr.function)
> >>>>>> +                 continue;
> >>>>>> +
> >>>>>> +         ctx = ibv_open_device(ibdev);
> >>>>>> +         if (!ctx) {
> >>>>>> +                 DRV_LOG(ERR, "Failed to open IB device %s",
> >>>>>> +                         ibdev->name);
> >>>>>> +                 continue;
> >>>>>> +         }
> >>>>>> +
> >>>>>> +         ret = ibv_query_device_ex(ctx, NULL, &dev_attr);
> >>>>>> +         DRV_LOG(INFO, "dev_attr.orig_attr.phys_port_cnt %u",
> >>>>>> +                 dev_attr.orig_attr.phys_port_cnt);
> >>>>>> +         found_port = false;
> >>>>>> +
> >>>>>> +         for (port = 1; port <= dev_attr.orig_attr.phys_port_cnt;
> >>>>>> +              port++) {
> >>>>>> +                 struct ibv_parent_domain_init_attr attr = {};
> >>>>>> +                 struct rte_ether_addr addr;
> >>>>>> +                 char address[64];
> >>>>>> +                 char name[RTE_ETH_NAME_MAX_LEN];
> >>>>>> +
> >>>>>> +                 ret = get_port_mac(ibdev, port, &addr);
> >>>>>> +                 if (ret)
> >>>>>> +                         continue;
> >>>>>> +
> >>>>>> +                 if (mac_addr && !rte_is_same_ether_addr(&addr,
> >>>>> mac_addr))
> >>>>>> +                         continue;
> >>>>>> +
> >>>>>> +                 rte_ether_format_addr(address, sizeof(address),
> >>>>> &addr);
> >>>>>> +                 DRV_LOG(INFO, "device located port %u address
> >>>>> %s",
> >>>>>> +                         port, address);
> >>>>>> +                 found_port = true;
> >>>>>> +
> >>>>>> +                 priv = rte_zmalloc_socket(NULL, sizeof(*priv),
> >>>>>> +                                           RTE_CACHE_LINE_SIZE,
> >>>>>> +                                           SOCKET_ID_ANY);
> >>>>>> +                 if (!priv) {
> >>>>>> +                         ret = -ENOMEM;
> >>>>>> +                         goto failed;
> >>>>>> +                 }
> >>>>>> +
> >>>>>> +                 snprintf(name, sizeof(name), "%s_port%d",
> >>>>>> +                          pci_dev->device.name, port);
> >>>>>> +
> >>>>>> +                 if (rte_eal_process_type() ==
> >>>>> RTE_PROC_SECONDARY) {
> >>>>>> +                         int fd;
> >>>>>> +
> >>>>>> +                         eth_dev =
> >>>>> rte_eth_dev_attach_secondary(name);
> >>>>>> +                         if (!eth_dev) {
> >>>>>> +                                 DRV_LOG(ERR, "Can't attach to dev
> >>>>> %s",
> >>>>>> +                                         name);
> >>>>>> +                                 ret = -ENOMEM;
> >>>>>> +                                 goto failed;
> >>>>>> +                         }
> >>>>>> +
> >>>>>> +                         eth_dev->device = &pci_dev->device;
> >>>>>> +                         eth_dev->dev_ops = &mana_dev_sec_ops;
> >>>>>> +                         ret = mana_proc_priv_init(eth_dev);
> >>>>>> +                         if (ret)
> >>>>>> +                                 goto failed;
> >>>>>> +                         priv->process_priv = eth_dev-
> >>>>>> process_private;
> >>>>>> +
> >>>>>> +                         /* Get the IB FD from the primary
> >>>>>> process */
> >>>>>> +                         fd =
> >>>>> mana_mp_req_verbs_cmd_fd(eth_dev);
> >>>>>> +                         if (fd < 0) {
> >>>>>> +                                 DRV_LOG(ERR, "Failed to get FD %d",
> >>>>> fd);
> >>>>>> +                                 ret = -ENODEV;
> >>>>>> +                                 goto failed;
> >>>>>> +                         }
> >>>>>> +
> >>>>>> +                         ret =
> >>>>> mana_map_doorbell_secondary(eth_dev, fd);
> >>>>>> +                         if (ret) {
> >>>>>> +                                 DRV_LOG(ERR, "Failed secondary
> >>>>> map %d",
> >>>>>> +                                         fd);
> >>>>>> +                                 goto failed;
> >>>>>> +                         }
> >>>>>> +
> >>>>>> +                         /* fd is no not used after mapping
> >>>>>> doorbell */
> >>>>>> +                         close(fd);
> >>>>>> +
> >>>>>> +                         rte_spinlock_lock(&mana_shared_data-
> >>>>>> lock);
> >>>>>> +                         mana_shared_data->secondary_cnt++;
> >>>>>> +                         mana_local_data.secondary_cnt++;
> >>>>>> +                         rte_spinlock_unlock(&mana_shared_data-
> >>>>>> lock);
> >>>>>> +
> >>>>>> +                         rte_eth_copy_pci_info(eth_dev, pci_dev);
> >>>>>> +                         rte_eth_dev_probing_finish(eth_dev);
> >>>>>> +
> >>>>>> +                         /* Impossible to have more than one port
> >>>>>> +                          * matching a MAC address
> >>>>>> +                          */
> >>>>>> +                         continue;
> >>>>>> +                 }
> >>>>>> +
> >>>>>> +                 eth_dev = rte_eth_dev_allocate(name);
> >>>>>> +                 if (!eth_dev) {
> >>>>>> +                         ret = -ENOMEM;
> >>>>>> +                         goto failed;
> >>>>>> +                 }
> >>>>>> +
> >>>>>> +                 eth_dev->data->mac_addrs =
> >>>>>> +                         rte_calloc("mana_mac", 1,
> >>>>>> +                                    sizeof(struct
> >>>>>> rte_ether_addr), 0);
> >>>>>> +                 if (!eth_dev->data->mac_addrs) {
> >>>>>> +                         ret = -ENOMEM;
> >>>>>> +                         goto failed;
> >>>>>> +                 }
> >>>>>> +
> >>>>>> +                 rte_ether_addr_copy(&addr, eth_dev->data-
> >>>>>> mac_addrs);
> >>>>>> +
> >>>>>> +                 priv->ib_pd = ibv_alloc_pd(ctx);
> >>>>>> +                 if (!priv->ib_pd) {
> >>>>>> +                         DRV_LOG(ERR, "ibv_alloc_pd failed port %d",
> >>>>> port);
> >>>>>> +                         ret = -ENOMEM;
> >>>>>> +                         goto failed;
> >>>>>> +                 }
> >>>>>> +
> >>>>>> +                 /* Create a parent domain with the port number */
> >>>>>> +                 attr.pd = priv->ib_pd;
> >>>>>> +                 attr.comp_mask =
> >>>>> IBV_PARENT_DOMAIN_INIT_ATTR_PD_CONTEXT;
> >>>>>> +                 attr.pd_context = (void *)(uint64_t)port;
> >>>>>> +                 priv->ib_parent_pd = ibv_alloc_parent_domain(ctx,
> >>>>> &attr);
> >>>>>> +                 if (!priv->ib_parent_pd) {
> >>>>>> +                         DRV_LOG(ERR,
> >>>>>> +                                 "ibv_alloc_parent_domain failed
> >>>>>> port
> >>>>> %d",
> >>>>>> +                                 port);
> >>>>>> +                         ret = -ENOMEM;
> >>>>>> +                         goto failed;
> >>>>>> +                 }
> >>>>>> +
> >>>>>> +                 priv->ib_ctx = ctx;
> >>>>>> +                 priv->port_id = eth_dev->data->port_id;
> >>>>>> +                 priv->dev_port = port;
> >>>>>> +                 eth_dev->data->dev_private = priv;
> >>>>>> +                 priv->dev_data = eth_dev->data;
> >>>>>> +
> >>>>>> +                 priv->max_rx_queues = dev_attr.orig_attr.max_qp;
> >>>>>> +                 priv->max_tx_queues = dev_attr.orig_attr.max_qp;
> >>>>>> +
> >>>>>> +                 priv->max_rx_desc =
> >>>>>> +                         RTE_MIN(dev_attr.orig_attr.max_qp_wr,
> >>>>>> +                                 dev_attr.orig_attr.max_cqe);
> >>>>>> +                 priv->max_tx_desc =
> >>>>>> +                         RTE_MIN(dev_attr.orig_attr.max_qp_wr,
> >>>>>> +                                 dev_attr.orig_attr.max_cqe);
> >>>>>> +
> >>>>>> +                 priv->max_send_sge = dev_attr.orig_attr.max_sge;
> >>>>>> +                 priv->max_recv_sge = dev_attr.orig_attr.max_sge;
> >>>>>> +
> >>>>>> +                 priv->max_mr = dev_attr.orig_attr.max_mr;
> >>>>>> +                 priv->max_mr_size =
> >>>>> dev_attr.orig_attr.max_mr_size;
> >>>>>> +
> >>>>>> +                 DRV_LOG(INFO, "dev %s max queues %d desc %d
> >>>>> sge %d",
> >>>>>> +                         name, priv->max_rx_queues, priv-
> >>>>>> max_rx_desc,
> >>>>>> +                         priv->max_send_sge);
> >>>>>> +
> >>>>>> +                 rte_spinlock_lock(&mana_shared_data->lock);
> >>>>>> +                 mana_shared_data->primary_cnt++;
> >>>>>> +                 rte_spinlock_unlock(&mana_shared_data->lock);
> >>>>>> +
> >>>>>> +                 eth_dev->data->dev_flags |=
> >>>>> RTE_ETH_DEV_INTR_RMV;
> >>>>>> +
> >>>>>> +                 eth_dev->device = &pci_dev->device;
> >>>>>> +                 eth_dev->data->dev_flags |=
> >>>>>> +                         RTE_ETH_DEV_AUTOFILL_QUEUE_XSTATS;
> >>>>>> +
> >>>>> Please do not use the temporary macro. Please review this patch:
> >>>>>
> >>>>> f30e69b41f94 ("ethdev: add device flag to bypass auto-filled queue
> >>>>> xstats")
> >>>>>
> >>>>> This patch requires that per queue statistics are filled in
> >>>>> .xstats_get() by PMD.
> >>>> Thanks for pointing this out.
> >>>>
> >>>> It seems some PMDs are still depending on this flag for xstats.
> >>>>
> >>>> MANA doesn't implement xstats_get() currently, this flag is useful.
> >>>> Is it
> >>> okay to keep using this flag before it's finally the time to remove
> >>> it from all
> >>> PMDs, or when MANA implements xstats?
> >>> Yes, your xstats doesn't implement now. Per queue stats should be
> >>> filled in
> >>> xstats API, and the stats API cannot see per queue stats, so stats
> >>> API in driver
> >>> shouldn't fill it(suggest that delete it from patch 17/18).
> >>>
> >>> I guess this flag can be removed if PMD does not support xstats.
> >>>>
> >> I don't understand your suggestion. An application can call
> >> rte_eth_stats_get() to get port stats, and this will call into
> >> stats_get() in the driver, as implemented in patch 17/18.
> >>
> >> When flag RTE_ETH_DEV_AUTOFILL_QUEUE_XSTATS  is set, an application
> >> can also use rte_eth_xstats_get() to get port stats even the driver
> >> doesn't implement xstats_get().
> >
> > I think new PMD should follow the announced switch Ferruh mentioned,
> > otherwise, the switch will never be completed.
> >
> > Suggest that mana driver can implement a simple xstats_get() to fill per
> > queue stats if you want to support per queue stats.
> >
> > @Ferruh, what do you think?
> >
> 
> Hi Huisong,
> 
> Thanks for reminding it, yes it makes sense to implement new method in
> new drivers.
> 
> 
> Long,
> 
> There is a long term plan to move queue stats from basic stats structure
> to xstats. The reason behind is increasing number of queues makes basis
> stats struct too big, on the other hand xstats is more flexible and no
> fixes size array is required.
> 
> You can remove the 'RTE_ETH_DEV_AUTOFILL_QUEUE_XSTATS' flag and
> driver
> won't support queue stats in xstats by default. Instead to have that
> support you will need to implement xstats, later or in this set.
> When queue stats implemented in xstats, please remember to remove
> updating 'stats->q_*' in basic stats.
> 
> Also in 17/18, the feature 'Stats per queue' seems added, but that is
> not correct, feature name is misleading here. But it is about queue
> stats mapping, please check 'doc/guides/nics/features.rst'.
> So can you please drop that feature.

Thank you for clearing the confusion. I'm removing "Stats per queue" in 
doc/guides/nics/features/mana.ini. The other parts of patch 17/18 are still 
relevant so not changing them.

Also removing 'RTE_ETH_DEV_AUTOFILL_QUEUE_XSTATS' as suggested.

Reply via email to