Alexey, > >>> +void pnv_try_isolate_nvidia_v100(struct pci_dev *bridge) > >>> +{ > >>> + u32 mask, val; > >>> + void __iomem *bar0_0, *bar0_120000, *bar0_a00000; > >>> + struct pci_dev *pdev; > >>> + u16 cmd = 0, cmdmask = PCI_COMMAND_MEMORY; > >>> + > >>> + if (!bridge->subordinate) > >>> + return; > >>> + > >>> + pdev = list_first_entry_or_null(&bridge->subordinate->devices, > >>> + struct pci_dev, bus_list); > >>> + if (!pdev) > >>> + return; > >>> + > >>> + if (pdev->vendor != PCI_VENDOR_ID_NVIDIA)
Don't you also need to check the PCIe devid to match only [PV]100 devices as well? I doubt there's any guarantee these registers will remain the same for all future (or older) NVIDIA devices. IMHO this should really be done in the device driver in the guest. A malcious guest could load a modified driver that doesn't do this, but that should not compromise other guests which presumably load a non-compromised driver that disables the links on that guests GPU. However I guess in practice what you have here should work equally well. - Alistair > >>> + return; > >>> + > >>> + mask = nvlinkgpu_get_disable_mask(&pdev->dev); > >>> + if (!mask) > >>> + return; > >>> + > >>> + bar0_0 = pci_iomap_range(pdev, 0, 0, 0x10000); > >>> + if (!bar0_0) { > >>> + pci_err(pdev, "Error mapping BAR0 @0\n"); > >>> + return; > >>> + } > >>> + bar0_120000 = pci_iomap_range(pdev, 0, 0x120000, 0x10000); > >>> + if (!bar0_120000) { > >>> + pci_err(pdev, "Error mapping BAR0 @120000\n"); > >>> + goto bar0_0_unmap; > >>> + } > >>> + bar0_a00000 = pci_iomap_range(pdev, 0, 0xA00000, 0x10000); > >>> + if (!bar0_a00000) { > >>> + pci_err(pdev, "Error mapping BAR0 @A00000\n"); > >>> + goto bar0_120000_unmap; > >>> + } > >> > >> Is it really necessary to do three separate ioremaps vs one that would > >> cover them all here? I suspect you're just sneaking in PAGE_SIZE with > >> the 0x10000 size mappings anyway. Seems like it would simplify setup, > >> error reporting, and cleanup to to ioremap to the PAGE_ALIGN'd range > >> of the highest register accessed. Thanks, > > > > Sure I can map it once, I just do not see the point in mapping/unmapping > > all 0xa10000>>16=161 system pages for a very short period of time while > > we know precisely that we need just 3 pages. > > > > Repost? > > Ping? > > Can this go in as it is (i.e. should I ping Michael) or this needs > another round? It would be nice to get some formal acks. Thanks, > > >> Alex > >> > >>> + > >>> + pci_restore_state(pdev); > >>> + pci_read_config_word(pdev, PCI_COMMAND, &cmd); > >>> + if ((cmd & cmdmask) != cmdmask) > >>> + pci_write_config_word(pdev, PCI_COMMAND, cmd | cmdmask); > >>> + > >>> + /* > >>> + * The sequence is from "Tesla P100 and V100 SXM2 NVLink Isolation on > >>> + * Multi-Tenant Systems". > >>> + * The register names are not provided there either, hence raw values. > >>> + */ > >>> + iowrite32(0x4, bar0_120000 + 0x4C); > >>> + iowrite32(0x2, bar0_120000 + 0x2204); > >>> + val = ioread32(bar0_0 + 0x200); > >>> + val |= 0x02000000; > >>> + iowrite32(val, bar0_0 + 0x200); > >>> + val = ioread32(bar0_a00000 + 0x148); > >>> + val |= mask; > >>> + iowrite32(val, bar0_a00000 + 0x148); > >>> + > >>> + if ((cmd | cmdmask) != cmd) > >>> + pci_write_config_word(pdev, PCI_COMMAND, cmd); > >>> + > >>> + pci_iounmap(pdev, bar0_a00000); > >>> +bar0_120000_unmap: > >>> + pci_iounmap(pdev, bar0_120000); > >>> +bar0_0_unmap: > >>> + pci_iounmap(pdev, bar0_0); > >>> +}