While it's described as errata for gen8/9, the previous reporting was from 2015 which predates those generations. I tested it on a gen 7 box (which was causing me the grief of consuming my Xen console buffer). It could be the case that the FPD code isn't implemented (which wouldn't matter for gen8/9 because it is broken), and the original problem of faulty firmware reporting bad ranges is the ultimate culprit. As far as the last two questions, I was testing on an older version of Xen (4.9.x) and ported it to master. Happy to hear a better approach to solving the original problem.
-Brendan On Sun, Apr 19, 2020 at 11:28 PM Tian, Kevin <kevin.t...@intel.com> wrote: > > From: Brendan Kerrigan <brendank...@gmail.com> > > Sent: Friday, April 17, 2020 9:36 PM > > > > From: Brendan Kerrigan <kerrig...@ainfosec.com> > > > > The Intel graphics device records DMAR faults regardless > > of the Fault Process Disable bit being set. When this fault > > Since this is an errata for specific generations, let's describe > this way instead of stating it as a generic problem. > > > occurs, enable the Interrupt Mask (IM) bit in the Fault > > Event Control (FECTL) register to prevent the continued > > recording of the fault. > > > > Signed-off-by: Brendan Kerrigan <kerrig...@ainfosec.com> > > --- > > xen/drivers/passthrough/vtd/iommu.c | 9 +++++++++ > > 1 file changed, 9 insertions(+) > > > > diff --git a/xen/drivers/passthrough/vtd/iommu.c > > b/xen/drivers/passthrough/vtd/iommu.c > > index 07d40b37fe..288399d816 100644 > > --- a/xen/drivers/passthrough/vtd/iommu.c > > +++ b/xen/drivers/passthrough/vtd/iommu.c > > @@ -41,6 +41,8 @@ > > #include "vtd.h" > > #include "../ats.h" > > > > +#define IS_IGD(seg, id) (0 == seg && 0 == PCI_BUS(id) && 2 == > PCI_SLOT(id) > > && 0 == PCI_FUNC(id)) > > + > > struct mapped_rmrr { > > struct list_head list; > > u64 base, end; > > @@ -872,6 +874,13 @@ static int iommu_page_fault_do_one(struct > > vtd_iommu *iommu, int type, > > printk(XENLOG_G_WARNING VTDPREFIX "%s: reason %02x - %s\n", > > kind, fault_reason, reason); > > > > + if ( DMA_REMAP == fault_type && type && IS_IGD(seg, source_id) ) { > > + u32 fectl = dmar_readl(iommu->reg, DMAR_FECTL_REG); > > + fectl |= DMA_FECTL_IM; > > + dmar_writel(iommu->reg, DMAR_FECTL_REG, fectl); > > + printk(XENLOG_G_WARNING VTDPREFIX "Disabling DMAR faults for > > IGD\n"); > > + } > > + > > Several questions. First, I just note that FPD is not touched by any code > today. then how is this errata being caught? Second, we already have > pci_check_disable_device in place which stops DMAs from the problematic > device if it triggers excessive fault reports. why doesn't it work for your > case? Last, dma_msi_end just forces clearing the mask bit at end of > handling > the fault interrupt, making above fix meaningless. Those questions just > make me wonder how you actually test this patch and whether it's > necessary... > > Thanks > Kevin > > > if ( iommu_verbose && fault_type == DMA_REMAP ) > > print_vtd_entries(iommu, PCI_BUS(source_id), > PCI_DEVFN2(source_id), > > addr >> PAGE_SHIFT); > > -- > > 2.17.1 > >