While it's described as errata for gen8/9, the previous reporting was from
2015 which predates those generations. I tested it on a gen 7 box (which
was causing me the grief of consuming my Xen console buffer). It could be
the case that the FPD code isn't implemented (which wouldn't matter for
gen8/9 because it is broken), and the original problem of faulty firmware
reporting bad ranges is the ultimate culprit. As far as the last two
questions, I was testing on an older version of Xen (4.9.x) and ported it
to master. Happy to hear a better approach to solving the original problem.

-Brendan

On Sun, Apr 19, 2020 at 11:28 PM Tian, Kevin <kevin.t...@intel.com> wrote:

> > From: Brendan Kerrigan <brendank...@gmail.com>
> > Sent: Friday, April 17, 2020 9:36 PM
> >
> > From: Brendan Kerrigan <kerrig...@ainfosec.com>
> >
> > The Intel graphics device records DMAR faults regardless
> > of the Fault Process Disable bit being set. When this fault
>
> Since this is an errata for specific generations, let's describe
> this way instead of stating it as a generic problem.
>
> > occurs, enable the Interrupt Mask (IM) bit in the Fault
> > Event Control (FECTL) register to prevent the continued
> > recording of the fault.
> >
> > Signed-off-by: Brendan Kerrigan <kerrig...@ainfosec.com>
> > ---
> >  xen/drivers/passthrough/vtd/iommu.c | 9 +++++++++
> >  1 file changed, 9 insertions(+)
> >
> > diff --git a/xen/drivers/passthrough/vtd/iommu.c
> > b/xen/drivers/passthrough/vtd/iommu.c
> > index 07d40b37fe..288399d816 100644
> > --- a/xen/drivers/passthrough/vtd/iommu.c
> > +++ b/xen/drivers/passthrough/vtd/iommu.c
> > @@ -41,6 +41,8 @@
> >  #include "vtd.h"
> >  #include "../ats.h"
> >
> > +#define IS_IGD(seg, id) (0 == seg && 0 == PCI_BUS(id) && 2 ==
> PCI_SLOT(id)
> > && 0 == PCI_FUNC(id))
> > +
> >  struct mapped_rmrr {
> >      struct list_head list;
> >      u64 base, end;
> > @@ -872,6 +874,13 @@ static int iommu_page_fault_do_one(struct
> > vtd_iommu *iommu, int type,
> >      printk(XENLOG_G_WARNING VTDPREFIX "%s: reason %02x - %s\n",
> >             kind, fault_reason, reason);
> >
> > +    if ( DMA_REMAP == fault_type && type && IS_IGD(seg, source_id) ) {
> > +        u32 fectl = dmar_readl(iommu->reg, DMAR_FECTL_REG);
> > +        fectl |= DMA_FECTL_IM;
> > +        dmar_writel(iommu->reg, DMAR_FECTL_REG, fectl);
> > +        printk(XENLOG_G_WARNING VTDPREFIX "Disabling DMAR faults for
> > IGD\n");
> > +    }
> > +
>
> Several questions. First, I just note that FPD is not touched by any code
> today. then how is this errata being caught? Second, we already have
> pci_check_disable_device in place which stops DMAs from the problematic
> device if it triggers excessive fault reports. why doesn't it work for your
> case? Last, dma_msi_end just forces clearing the mask bit at end of
> handling
> the fault interrupt, making above fix meaningless. Those questions just
> make me wonder how you actually test this patch and whether it's
> necessary...
>
> Thanks
> Kevin
>
> >      if ( iommu_verbose && fault_type == DMA_REMAP )
> >          print_vtd_entries(iommu, PCI_BUS(source_id),
> PCI_DEVFN2(source_id),
> >                            addr >> PAGE_SHIFT);
> > --
> > 2.17.1
>
>

Reply via email to