On Wed, Jun 05, 2019 at 01:55:42AM -0600, Jan Beulich wrote:
> >>> On 04.06.19 at 18:20, <roger....@citrix.com> wrote:
> > On Tue, Jun 04, 2019 at 07:02:06AM -0600, Jan Beulich wrote:
> >> >>> On 04.06.19 at 10:48, <roger....@citrix.com> wrote:
> >> > On Mon, Jun 03, 2019 at 07:00:25AM -0600, Jan Beulich wrote:
> >> >> This reverts commit b6bd02b7a877f9fac2de69e64d8245d56f92ab25. The change
> >> >> was redundant with amd_iommu_detect_one_acpi() already calling
> >> >> pci_ro_device().
> >> >> 
> >> >> Signed-off-by: Jan Beulich <jbeul...@suse.com>
> >> > 
> >> > I think this needs to be squashed together with your `AMD/IOMMU: don't
> >> > "add" IOMMUs` patch, or else PVH dom0 will break because
> >> > update_paging_mode will find devices not behind an IOMMU assigned to
> >> > dom0, thus returning an error and crashing dom0.
> >> 
> >> I've taken another look (while correcting the other patch, now sent
> >> as v2): update_paging_mode() iterates over all devices the domain
> >> owns. The IOMMU ones, having been subject of an early
> >> pci_ro_device(), should never end up on Dom0's list. And looking at
> >> the code I also can't - for now at least - see how they could get
> >> moved there. In fact I've verified that they take the "override"
> >> path in _setup_hwdom_pci_devices().
> > 
> > As you realized this commit was indeed papering over an existing issue
> > elsewhere. When I did this patch I didn't realize
> > amd_iommu_detect_one_acpi was calling pci_ro_device.
> > 
> > The issue is that when pci_ro_device gets called by
> > amd_iommu_detect_one_acpi dom_xen has not been created yet, so
> > pdev->domain ends up being NULL.
> 
> Well, that's being fixed by "adjust system domain creation (and call it
> earlier on x86)" (note that it's "special" rather than "system" in the
> posted version). pdev->domain remaining to be NULL really is the
> smaller of the problems; accessing dom_xen->arch.pdev_list is the
> worse part here.

Oh, didn't see that series, thanks for the fixes, will try to review
them later today.

> One thing is curious though: So far I thought I would have screwed
> up things by having amd_iommu_detect_one_acpi() called earlier,
> as mentioned in that patch's description. Your change that I'd like
> to revert predates that though by several months, so I'm getting
> the impression the issue has been older. As a result the range of
> versions to backport this to may have to grow.

Yes, I'm still trying to figure out exactly what caused the issue that
b6bd02b7a877 fixed.

> > On a tangential note, there's also a dereference of dom_xen in
> > _pci_hide_device which doesn't trigger a page fault. Do we have
> > something mapped at linear address 0 on purpose?
> 
> Yes, during early (legacy) boot.

FWIW, I'm booting from UEFI using multiboot2 and I didn't get a page
fault in _pci_hide_device.

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Reply via email to