On 8/17/2018 11:48 AM, Jeff Guo wrote: > When a device is hotplugged out, the PCI resource is released in the > kernel, the UIO file descriptor will disappear and the irq will be > released. After this, a kernel crash will be caused if the igb uio driver > tries to access or release these resources. > > And more, uio_remove will be called unexpectedly before uio_release > when device be hotpluggged out, the uio_remove procedure will > free resources that are required by uio_release. This will later affect the > usage of interrupt as there is no way to disable the interrupt which is > defined in uio_release. > > To prevent this, the hotplug removal needs to be identified and processed > accordingly in igb uio driver. > > This patch proposes the addition of enum rte_udev_state in the > rte_uio_pci_dev struct. This will store the state of the uio device as one > of the following: probed/opened/released/removed. > > This patch also checks the kobject's remove_uevent_sent state to detect if > the removal status is hotplug-out. Once a hotplug-out is detected, it will > call uio_release and set the uio status to "removed". After that, uio will > check the status in the uio_release function. If uio has already been > removed, it will only free the dirty uio resource. > > Signed-off-by: Jeff Guo <jia....@intel.com> > Acked-by: Shaopeng He <shaopeng...@intel.com>
<...> > @@ -331,20 +351,35 @@ igbuio_pci_open(struct uio_info *info, struct inode > *inode) > > /* enable interrupts */ > err = igbuio_pci_enable_interrupts(udev); > - mutex_unlock(&udev->lock); > if (err) { > dev_err(&dev->dev, "Enable interrupt fails\n"); > + pci_clear_master(dev); Why pci_clear_master required here. btw, some part of this patch conflicts with [1], which removes mutes and use atomic refcnt operations, but introducing state seems needs mutex. [1] igb_uio: fix refcount if open returns error https://patches.dpdk.org/patch/44732/ > + mutex_unlock(&udev->lock); > return err; > } > + udev->state = RTE_UDEV_OPENNED; > + mutex_unlock(&udev->lock); > return 0; > } > > +/** > + * This gets called while closing uio device file. > + */ > static int > igbuio_pci_release(struct uio_info *info, struct inode *inode) > { > struct rte_uio_pci_dev *udev = info->priv; > struct pci_dev *dev = udev->pdev; > > + if (udev->state == RTE_UDEV_REMOVED) { > + mutex_destroy(&udev->lock); > + igbuio_pci_release_iomem(&udev->info); > + pci_disable_device(dev); > + pci_set_drvdata(dev, NULL); > + kfree(udev); > + return 0; This branch taken when pci_remove called before pci_release. - At this stage is "dev" valid, since pci_remove() called? - In this path uio_unregister_device() is missing, who unregisters uio? - sysfs_remove_group() also missing, it is not clear if it is forgotten or left out, what do you think move common part of pci_remove into new function and call both in pci_remove and here? And as a logic, can we make pci_remove clear everything, instead of doing some cleanup here. Like: pci_remove: - calls pci_release - instead of return keeps doing pci_remove work - set state to REMOVED pci_release: - if state is REMOVED, return without doing nothing btw, even after uio_unregister_device() how pci_release called? It can help to share crash backtrace in commit log, to describe problem in more detail.