On Mon, Nov 18, 2019 at 12:06 PM Alistair Popple <alist...@popple.id.au> wrote: > > On Wednesday, 13 November 2019 4:38:21 AM AEDT Frederic Barrat wrote: > > > > However, one question is whether this patch breaks nvlink and if nvlink > > assumes the devices won’t go away because we explicitly take a reference > > forever. In npu_dma.c, there are 2 functions which allow to find the GPU > > associated to a npu device, and vice-versa. Both functions return a > > pointer to a struct pci_dev, but they don’t take a reference on the > > device being returned. So that seems dangerous. I’m probably missing > > something. > > > > Alexey, Alistair: what, if anything, guarantees, that the npu or gpu > > devices stay valid. Is it because we simply don’t provide any means to > > get rid of them ? Otherwise, don’t we need the callers of > > pnv_pci_get_gpu_dev() and pnv_pci_get_npu_dev() to worry about reference > > counting ? I’ve started looking into it and the changes are scary, which > > explains Greg’s related commit 02c5f5394918b. > > To be honest the reference counting looks like it has evolved into something > quite suspect and I don't think you're missing anything. In practice though we > likely haven't hit any issues because the original callers didn't store > references to the pdev which would make the window quite small (although the > pass through work may have changed that). And as you say there simply wasn't > any means to get rid of them anyway (EEH, hotplug, etc. has never been > implemented or supported for GPUs and all sorts of terrible things happen if > you try).
In other words: leaking a ref is the only safe thing to do. > The best solution would likely be to review the reference counting and to > teach callers of get_*_dev() to call pci_put_dev(), etc. The issue is that the two callers of get_pci_dev() are non-GPL exported symbols so we don't know what's calling them or what their expectations are. We be doing whatever makes sense for OpenCAPI and if that happens to cause a problem for someone else, then they can deal with it. Oliver