On Wed, Apr 8, 2020 at 4:22 PM Sam Bobroff <sbobr...@linux.ibm.com> wrote: > > On Fri, Apr 03, 2020 at 05:08:32PM +1100, Oliver O'Halloran wrote: > > On Mon, 2020-03-30 at 15:56 +1100, Sam Bobroff wrote: > > > When EEH device state was released asynchronously by the device > > > release handler, it was possible for an outstanding reference to > > > prevent it's release and it was necessary to work around that if a > > > device was re-discovered at the same PCI location. > > > > I think this is a bit misleading. The main situation where you'll hit > > this hack is when recovering a device with a driver that doesn't > > implement the error handling callbacks. In that case the device is > > removed, reset, then re-probed by the PCI core, but we assume it's the > > same physical device so the eeh_device state remains active. > > > > If you actually changed the underlying device I suspect something bad > > would happen. > > I'm not sure I understand. Isn't the case you're talking about caught by > the earlier check (just above the patch)? > > if (edev->pdev == dev) { > eeh_edev_dbg(edev, "Device already referenced!\n"); > return; > }
No, in the case I'm talking about the pci_dev is torn down and freed(). After the PE is reset we re-probe the device and create a new pci_dev. If the release of the old pci_dev is delayed we need the hack this patch is removing. The check above should probably be a WARN_ON() since we should never be re-running the EEH probe on the same device. I think there is a case where that can happen, but I don't remember the details. Oliver