On 8/17/2018 11:48 AM, Jeff Guo wrote:
> When a device is hotplugged out, the PCI resource is released in the
> kernel, the UIO file descriptor will disappear and the irq will be
> released. After this, a kernel crash will be caused if the igb uio driver
> tries to access or release these resources.
> 
> And more, uio_remove will be called unexpectedly before uio_release
> when device be hotpluggged out, the uio_remove procedure will
> free resources that are required by uio_release. This will later affect the
> usage of interrupt as there is no way to disable the interrupt which is
> defined in uio_release.
> 
> To prevent this, the hotplug removal needs to be identified and processed
> accordingly in igb uio driver.
> 
> This patch proposes the addition of enum rte_udev_state in the
> rte_uio_pci_dev struct. This will store the state of the uio device as one
> of the following: probed/opened/released/removed.
> 
> This patch also checks the kobject's remove_uevent_sent state to detect if
> the removal status is hotplug-out. Once a hotplug-out is detected, it will
> call uio_release and set the uio status to "removed". After that, uio will
> check the status in the uio_release function. If uio has already been
> removed, it will only free the dirty uio resource.
> 
> Signed-off-by: Jeff Guo <jia....@intel.com>
> Acked-by: Shaopeng He <shaopeng...@intel.com>

<...>

> @@ -331,20 +351,35 @@ igbuio_pci_open(struct uio_info *info, struct inode 
> *inode)
>  
>       /* enable interrupts */
>       err = igbuio_pci_enable_interrupts(udev);
> -     mutex_unlock(&udev->lock);
>       if (err) {
>               dev_err(&dev->dev, "Enable interrupt fails\n");
> +             pci_clear_master(dev);

Why pci_clear_master required here.

btw, some part of this patch conflicts with [1], which removes mutes and use
atomic refcnt operations, but introducing state seems needs mutex.

[1]
igb_uio: fix refcount if open returns error
https://patches.dpdk.org/patch/44732/

> +             mutex_unlock(&udev->lock);
>               return err;
>       }
> +     udev->state = RTE_UDEV_OPENNED;
> +     mutex_unlock(&udev->lock);
>       return 0;
>  }
>  
> +/**
> + * This gets called while closing uio device file.
> + */
>  static int
>  igbuio_pci_release(struct uio_info *info, struct inode *inode)
>  {
>       struct rte_uio_pci_dev *udev = info->priv;
>       struct pci_dev *dev = udev->pdev;
>  
> +     if (udev->state == RTE_UDEV_REMOVED) {
> +             mutex_destroy(&udev->lock);
> +             igbuio_pci_release_iomem(&udev->info);
> +             pci_disable_device(dev);
> +             pci_set_drvdata(dev, NULL);
> +             kfree(udev);
> +             return 0;

This branch taken when pci_remove called before pci_release.
- At this stage is "dev" valid, since pci_remove() called?
- In this path uio_unregister_device() is missing, who unregisters uio?
- sysfs_remove_group() also missing, it is not clear if it is forgotten or left
out, what do you think move common part of pci_remove into new function and call
both in pci_remove and here?

And as a logic, can we make pci_remove clear everything, instead of doing some
cleanup here. Like:
pci_remove:
- calls pci_release
- instead of return keeps doing pci_remove work
- set state to REMOVED

pci_release:
- if state is REMOVED, return without doing nothing

btw, even after uio_unregister_device() how pci_release called?


It can help to share crash backtrace in commit log, to describe problem in more
detail.

Reply via email to