-----Original Message----- > Date: Sun, 30 Apr 2017 19:29:49 +0200 > From: Thomas Monjalon <tho...@monjalon.net> > To: Alejandro Lucero <alejandro.luc...@netronome.com> > Cc: dev@dpdk.org, "Burakov, Anatoly" <anatoly.bura...@intel.com> > Subject: Re: [dpdk-dev] [PATCH] vfio: fix device unplug when several > devices per vfio group > > 28/04/2017 15:25, Burakov, Anatoly: > > From: Alejandro Lucero [mailto:alejandro.luc...@netronome.com] > > > VFIO allows a secure way of assigning devices to user space and those > > > devices which can not be isolated from other ones are set in same VFIO > > > group. Releasing or unplugging a device should be aware of remaining > > > devices is the same group for avoiding to close such a group. > > > > > > Fixes: 94c0776b1bad ("vfio: support hotplug") > > > > > > Signed-off-by: Alejandro Lucero <alejandro.luc...@netronome.com> > > > > I have tested this on my setup on an old kernel with multiple > > attach/detaches, and it works (whereas it fails without this patch). > > > > Acked-by: Anatoly Burakov <anatoly.bura...@intel.com> > > Applied, thanks
This patch creates issue when large number of PCIe devices connected to system. Found it through git bisect. This issue is, vfio_group_fd goes beyond 64(VFIO_MAX_GROUPS) and writes to wrong memory on following code execution and sub sequentially creates issues in vfio mapping or such. vfio_cfg.vfio_groups[vfio_group_fd].devices++; I can increase VFIO_MAX_GROUPS, but I think, it is not correct fix as vfio_group_fd generated from open system call. I add some prints the code for debug. Please find below the output. Any thoughts from VFIO experts? ➜ [master]83xx [dpdk-master] $ git diff diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c index d3eae20..2d8ee4c 100644 --- a/lib/librte_eal/linuxapp/eal/eal_vfio.c +++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c @@ -100,6 +100,7 @@ vfio_get_group_fd(int iommu_group_no) snprintf(filename, sizeof(filename), VFIO_GROUP_FMT, iommu_group_no); vfio_group_fd = open(filename, O_RDWR); + printf("###### name %s vfio_group_fd %d\n", filename, vfio_group_fd); if (vfio_group_fd < 0) { /* if file not found, it's not an error */ if (errno != ENOENT) { @@ -259,6 +260,8 @@ vfio_setup_device(const char *sysfs_base, const char *dev_addr, if (vfio_group_fd < 0) return -1; + printf("#### iommu_group_fd %d vfio_group_fd=%d\n", iommu_group_no, vfio_group_fd); + /* if group_fd == 0, that means the device isn't managed by VFIO * */ if (vfio_group_fd == 0) { RTE_LOG(WARNING, EAL, " %s not managed by VFIO driver, skipping\n", @@ -266,6 +269,7 @@ vfio_setup_device(const char *sysfs_base, const char *dev_addr, return 1; } /* * at this point, we know that this group is viable (meaning, * all devices * are either bound to VFIO or not bound to anything) @@ -359,6 +363,7 @@ vfio_setup_device(const char *sysfs_base, const char *dev_addr, return -1; } vfio_cfg.vfio_groups[vfio_group_fd].devices++; + printf("vfio_group_fd %d device %d\n", vfio_group_fd, vfio_cfg.vfio_groups[vfio_group_fd].devices++); return 0; } output log ---------- EAL: PCI device 0000:07:00.1 on NUMA socket 0 EAL: probe driver: 177d:a04b octeontx_ssovf ###### name /dev/vfio/114 vfio_group_fd 44 #### iommu_group_fd 114 vfio_group_fd=44 EAL: using IOMMU type 1 (Type 1) vfio_group_fd 44 device 1 EAL: PCI device 0000:07:00.2 on NUMA socket 0 EAL: probe driver: 177d:a04b octeontx_ssovf ###### name /dev/vfio/115 vfio_group_fd 47 #### iommu_group_fd 115 vfio_group_fd=47 vfio_group_fd 47 device 1 EAL: PCI device 0000:07:00.3 on NUMA socket 0 EAL: probe driver: 177d:a04b octeontx_ssovf ###### name /dev/vfio/116 vfio_group_fd 50 #### iommu_group_fd 116 vfio_group_fd=50 vfio_group_fd 50 device 1 EAL: PCI device 0000:07:00.4 on NUMA socket 0 EAL: probe driver: 177d:a04b octeontx_ssovf ###### name /dev/vfio/117 vfio_group_fd 53 #### iommu_group_fd 117 vfio_group_fd=53 vfio_group_fd 53 device 1 EAL: PCI device 0000:07:00.5 on NUMA socket 0 EAL: probe driver: 177d:a04b octeontx_ssovf ###### name /dev/vfio/118 vfio_group_fd 56 #### iommu_group_fd 118 vfio_group_fd=56 vfio_group_fd 56 device 1 EAL: PCI device 0000:07:00.6 on NUMA socket 0 EAL: probe driver: 177d:a04b octeontx_ssovf ###### name /dev/vfio/119 vfio_group_fd 59 #### iommu_group_fd 119 vfio_group_fd=59 vfio_group_fd 59 device 1 EAL: PCI device 0000:07:00.7 on NUMA socket 0 EAL: probe driver: 177d:a04b octeontx_ssovf ###### name /dev/vfio/120 vfio_group_fd 62 #### iommu_group_fd 120 vfio_group_fd=62 vfio_group_fd 62 device 1 EAL: PCI device 0000:07:01.0 on NUMA socket 0 EAL: probe driver: 177d:a04b octeontx_ssovf ###### name /dev/vfio/121 vfio_group_fd 65 #### iommu_group_fd 121 vfio_group_fd=65 vfio_group_fd 65 device 1632632833 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^(memory corruption here) EAL: PCI device 0000:08:00.1 on NUMA socket 0 EAL: probe driver: 177d:a04d octeontx_ssowvf ###### name /dev/vfio/122 vfio_group_fd 68 #### iommu_group_fd 122 vfio_group_fd=68 vfio_group_fd 68 device 1 EAL: PCI device 0000:08:00.2 on NUMA socket 0 EAL: probe driver: 177d:a04d octeontx_ssowvf ###### name /dev/vfio/123 vfio_group_fd 71 #### iommu_group_fd 123 vfio_group_fd=71 vfio_group_fd 71 device 99999941 EAL: PCI device 0000:08:00.3 on NUMA socket 0 EAL: probe driver: 177d:a04d octeontx_ssowvf ###### name /dev/vfio/124 vfio_group_fd 74 #### iommu_group_fd 124 vfio_group_fd=74 vfio_group_fd 74 device 1 EAL: PCI device 0000:08:00.4 on NUMA socket 0 EAL: probe driver: 177d:a04d octeontx_ssowvf ###### name /dev/vfio/125 vfio_group_fd 77 #### iommu_group_fd 125 vfio_group_fd=77 vfio_group_fd 77 device 1 EAL: PCI device 0000:08:00.5 on NUMA socket 0 EAL: probe driver: 177d:a04d octeontx_ssowvf ###### name /dev/vfio/126 vfio_group_fd 80 #### iommu_group_fd 126 vfio_group_fd=80 vfio_group_fd 80 device 1 EAL: PCI device 0000:08:00.6 on NUMA socket 0 EAL: probe driver: 177d:a04d octeontx_ssowvf ###### name /dev/vfio/127 vfio_group_fd 83 #### iommu_group_fd 127 vfio_group_fd=83 vfio_group_fd 83 device 1 EAL: PCI device 0000:08:00.7 on NUMA socket 0 EAL: probe driver: 177d:a04d octeontx_ssowvf EAL: PCI device 0000:08:01.0 on NUMA socket 0 EAL: probe driver: 177d:a04d octeontx_ssowvf EAL: PCI device 0001:01:00.1 on NUMA socket 0 EAL: probe driver: 177d:a034 net_thunderx ###### name /dev/vfio/64 vfio_group_fd 86 #### iommu_group_fd 64 vfio_group_fd=86 vfio_group_fd 86 device 1 Segmentation fault