Hi Anatoly, > -----Original Message----- > From: Burakov, Anatoly > Sent: Thursday, April 12, 2018 10:04 PM > To: Wang, Xiao W <xiao.w.w...@intel.com>; Yigit, Ferruh > <ferruh.yi...@intel.com> > Cc: dev@dpdk.org; maxime.coque...@redhat.com; Wang, Zhihong > <zhihong.w...@intel.com>; Bie, Tiwei <tiwei....@intel.com>; Tan, Jianfeng > <jianfeng....@intel.com>; Liang, Cunming <cunming.li...@intel.com>; Daly, > Dan <dan.d...@intel.com>; tho...@monjalon.net; gaetan.ri...@6wind.com; > hemant.agra...@nxp.com; Chen, Junjie J <junjie.j.c...@intel.com> > Subject: Re: [PATCH v6 1/4] eal/vfio: add multiple container support > > On 12-Apr-18 8:19 AM, Xiao Wang wrote: > > Currently eal vfio framework binds vfio group fd to the default > > container fd during rte_vfio_setup_device, while in some cases, > > e.g. vDPA (vhost data path acceleration), we want to put vfio group > > to a separate container and program IOMMU via this container. > > > > This patch adds some APIs to support container creating and device > > binding with a container. > > > > A driver could use "rte_vfio_create_container" helper to create a > > new container from eal, use "rte_vfio_bind_group" to bind a device > > to the newly created container. > > > > During rte_vfio_setup_device, the container bound with the device > > will be used for IOMMU setup. > > > > Signed-off-by: Junjie Chen <junjie.j.c...@intel.com> > > Signed-off-by: Xiao Wang <xiao.w.w...@intel.com> > > Reviewed-by: Maxime Coquelin <maxime.coque...@redhat.com> > > Reviewed-by: Ferruh Yigit <ferruh.yi...@intel.com> > > --- > > Apologies for late review. Some comments below. > > <...> > > > > > +struct rte_memseg; > > + > > /** > > * Setup vfio_cfg for the device identified by its address. > > * It discovers the configured I/O MMU groups or sets a new one for the > device. > > @@ -131,6 +133,117 @@ rte_vfio_clear_group(int vfio_group_fd); > > } > > #endif > > > > <...> > > > +/** > > + * @warning > > + * @b EXPERIMENTAL: this API may change, or be removed, without prior > notice > > + * > > + * Perform dma mapping for devices in a conainer. > > + * > > + * @param container_fd > > + * the specified container fd > > + * > > + * @param dma_type > > + * the dma map type > > + * > > + * @param ms > > + * the dma address region to map > > + * > > + * @return > > + * 0 if successful > > + * <0 if failed > > + */ > > +int __rte_experimental > > +rte_vfio_dma_map(int container_fd, int dma_type, const struct > rte_memseg *ms); > > + > > First of all, why memseg, instead of va/iova/len? This seems like > unnecessary attachment to internals of DPDK memory representation. Not > all memory comes in memsegs, this makes the API unnecessarily specific > to DPDK memory.
Agree, will use va/iova/len. > > Also, why providing DMA type? There's already a VFIO type pointer in > vfio_config - you can set this pointer for every new created container, > so the user wouldn't have to care about IOMMU type. Is it not possible > to figure out DMA type from within EAL VFIO? If not, maybe provide an > API to do so, e.g. rte_vfio_container_set_dma_type()? It's possible, EAL VFIO should be able to figure out a container's DMA type. > > This will also need to be rebased on top of latest HEAD because there > already is a similar DMA map/unmap API added, only without the container > parameter. Perhaps rename these new functions to > rte_vfio_container_(create|destroy|dma_map|dma_unmap)? OK, will check the latest HEAD and rebase on that. > > > +/** > > + * @warning > > + * @b EXPERIMENTAL: this API may change, or be removed, without prior > notice > > + * > > + * Perform dma unmapping for devices in a conainer. > > + * > > + * @param container_fd > > + * the specified container fd > > + * > > + * @param dma_type > > + * the dma map type > > + * > > + * @param ms > > + * the dma address region to unmap > > + * > > + * @return > > + * 0 if successful > > + * <0 if failed > > + */ > > +int __rte_experimental > > +rte_vfio_dma_unmap(int container_fd, int dma_type, const struct > rte_memseg *ms); > > + > > #endif /* VFIO_PRESENT */ > > > > <...> > > > @@ -75,8 +53,8 @@ vfio_get_group_fd(int iommu_group_no) > > if (vfio_group_fd < 0) { > > /* if file not found, it's not an error */ > > if (errno != ENOENT) { > > - RTE_LOG(ERR, EAL, "Cannot open %s: %s\n", > filename, > > - strerror(errno)); > > + RTE_LOG(ERR, EAL, "Cannot open %s: %s\n", > > + filename, strerror(errno)); > > This looks like unintended change. > > > return -1; > > } > > > > @@ -86,8 +64,10 @@ vfio_get_group_fd(int iommu_group_no) > > vfio_group_fd = open(filename, O_RDWR); > > if (vfio_group_fd < 0) { > > if (errno != ENOENT) { > > - RTE_LOG(ERR, EAL, "Cannot > open %s: %s\n", filename, > > - strerror(errno)); > > + RTE_LOG(ERR, EAL, > > + "Cannot open %s: %s\n", > > + filename, > > + strerror(errno)); > > This looks like unintended change. > > > return -1; > > } > > return 0; > > @@ -95,21 +75,19 @@ vfio_get_group_fd(int iommu_group_no) > > /* noiommu group found */ > > } > > > > - cur_grp->group_no = iommu_group_no; > > - cur_grp->fd = vfio_group_fd; > > - vfio_cfg.vfio_active_groups++; > > return vfio_group_fd; > > } > > - /* if we're in a secondary process, request group fd from the primary > > + /* > > + * if we're in a secondary process, request group fd from the primary > > * process via our socket > > */ > > This looks like unintended change. > > > else { > > - int socket_fd, ret; > > - > > - socket_fd = vfio_mp_sync_connect_to_primary(); > > + int ret; > > + int socket_fd = vfio_mp_sync_connect_to_primary(); > > > > if (socket_fd < 0) { > > - RTE_LOG(ERR, EAL, " cannot connect to primary > process!\n"); > > + RTE_LOG(ERR, EAL, > > + " cannot connect to primary process!\n"); > > This looks like unintended change. > > > return -1; > > } > > if (vfio_mp_sync_send_request(socket_fd, > SOCKET_REQ_GROUP) < 0) { > > @@ -122,6 +100,7 @@ vfio_get_group_fd(int iommu_group_no) > > close(socket_fd); > > return -1; > > } > > + > > ret = vfio_mp_sync_receive_request(socket_fd); > > This looks like unintended change. > > (hint: "git revert -n HEAD && git add -p" is your friend :) ) Thanks, will remove these diff. > > > switch (ret) { > > case SOCKET_NO_FD: > > @@ -132,9 +111,6 @@ vfio_get_group_fd(int iommu_group_no) > > /* if we got the fd, store it and return it */ > > if (vfio_group_fd > 0) { > > close(socket_fd); > > - cur_grp->group_no = iommu_group_no; > > - cur_grp->fd = vfio_group_fd; > > - vfio_cfg.vfio_active_groups++; > > return vfio_group_fd; > > } > > /* fall-through on error */ > > @@ -147,70 +123,349 @@ vfio_get_group_fd(int iommu_group_no) > > return -1; > > <...> > > > +int __rte_experimental > > +rte_vfio_create_container(void) > > +{ > > + struct vfio_config *vfio_cfg; > > + int i; > > + > > + /* Find an empty slot to store new vfio config */ > > + for (i = 1; i < VFIO_MAX_CONTAINERS; i++) { > > + if (vfio_cfgs[i] == NULL) > > + break; > > + } > > + > > + if (i == VFIO_MAX_CONTAINERS) { > > + RTE_LOG(ERR, EAL, "exceed max vfio container limit\n"); > > + return -1; > > + } > > + > > + vfio_cfgs[i] = rte_zmalloc("vfio_container", sizeof(struct vfio_config), > > + RTE_CACHE_LINE_SIZE); > > + if (vfio_cfgs[i] == NULL) > > + return -ENOMEM; > > Is there a specific reason why 1) dynamic allocation is used (as opposed > to just storing a static array), and 2) DPDK memory allocation is used? > This seems like unnecessary complication. > > Even if you were to decide to allocate memory instead of having a static > array, you'll have to register for rte_eal_cleanup() to delete any > allocated containers on DPDK exit. But, as i said, i think it would be > better to keep it as static array. > Thanks for the suggestion, static array looks simpler and cleaner. > > + > > + RTE_LOG(INFO, EAL, "alloc container at slot %d\n", i); > > + vfio_cfg = vfio_cfgs[i]; > > + vfio_cfg->vfio_active_groups = 0; > > + vfio_cfg->vfio_container_fd = vfio_get_container_fd(); > > + > > + if (vfio_cfg->vfio_container_fd < 0) { > > + rte_free(vfio_cfgs[i]); > > + vfio_cfgs[i] = NULL; > > + return -1; > > + } > > + > > + for (i = 0; i < VFIO_MAX_GROUPS; i++) { > > + vfio_cfg->vfio_groups[i].group_no = -1; > > + vfio_cfg->vfio_groups[i].fd = -1; > > + vfio_cfg->vfio_groups[i].devices = 0; > > + } > > <...> > > > @@ -665,41 +931,80 @@ vfio_get_group_no(const char *sysfs_base, > > } > > > > static int > > -vfio_type1_dma_map(int vfio_container_fd) > > +do_vfio_type1_dma_map(int vfio_container_fd, const struct rte_memseg > *ms) > > <...> > > > > +static int > > +do_vfio_type1_dma_unmap(int vfio_container_fd, const struct > rte_memseg *ms) > > API's such as these two were recently added to DPDK. Will check and rebase. BRs, Xiao > > -- > Thanks, > Anatoly