On Sun, Apr 30, 2023 at 06:47:13PM -0500, Jason A. Harmening wrote:
> On Sun, Apr 30, 2023 at 08:09:16AM +0300, Konstantin Belousov wrote:
> > On Sat, Apr 29, 2023 at 02:27:50PM -0500, Jason A. Harmening wrote:
> > > On Sat, Apr 29, 2023 at 08:49:28PM +0200, Dimitry Andric wrote:
> > > > On 29 Apr 2023, at 20:33, Jason A. Harmening <j...@freebsd.org> wrote:
> > > > > 
> > > > > On Sun, Apr 09, 2023 at 09:35:22PM +0000, Dimitry Andric wrote:
> > > > >> The branch stable/13 has been updated by dim:
> > > > >> 
> > > > >> URL: 
> > > > >> https://cgit.FreeBSD.org/src/commit/?id=060699e9136975d51d3f726b9785bdbac9a62ba6
> > > > >> 
> > > > >> commit 060699e9136975d51d3f726b9785bdbac9a62ba6
> > > > >> Author:     Dimitry Andric <d...@freebsd.org>
> > > > >> AuthorDate: 2023-01-14 16:33:24 +0000
> > > > >> Commit:     Dimitry Andric <d...@freebsd.org>
> > > > >> CommitDate: 2023-04-09 14:54:52 +0000
> > > > >> 
> > > > >>    Merge llvm-project release/15.x llvmorg-15.0.7-0-g8dfdcc7b7bf6
> > > > >> 
> > > > >>    This updates llvm, clang, compiler-rt, libc++, libunwind, lld, 
> > > > >> lldb and
> > > > >>    openmp to llvmorg-15.0.7-0-g8dfdcc7b7bf6.
> > > > >> 
> > > > >>    PR:             265425
> > > > >>    MFC after:      2 weeks
> > > > > 
> > > > > This MFC of llvm15 appears to have completely broken the Intel IOMMU
> > > > > driver on my stable/13 machine.  After this series of commits, any
> > > > > downstream DMA seems to produce an IOMMU translation fault, which
> > > > > renders the machine completely unusable: no nvme boot disk, no usb
> > > > > keyboard, etc.
> > > > > 
> > > > > The faults I see look something like this:
> > > > > 
> > > > > DMAR4: ahci0: pci0:17:5 sid 8d fault acc 0 adt 0x0 reason 0x3 addr 
> > > > > 26000
> > > > > 
> > > > > It's a bit surprising to see a toolchain upgrade produce breakage like
> > > > > this, but that's what git bisect clearly tells me.  I wonder if some 
> > > > > of
> > > > > the IOMMU control structures might be defined as C bitfields and the 
> > > > > new
> > > > > compiler is emitting them differently?  Also, was any breakage like 
> > > > > this
> > > > > observed when -current was upgraded to llvm15 several months ago?
> > > > 
> > > > I haven't heard anything about such breakage, no.
> > > > 
> > > > 
> > > > > More generally, this is the second time in as many months I've had to
> > > > > deal with IOMMU breakage on -stable.  I can't imagine I'm the only
> > > > > person who sees value in running with DMA remapping enabled; do we 
> > > > > need
> > > > > a dedicated DMAR-enabled machine in the cluster to smoke-test changes
> > > > > like this?  More generally, should we avoid MFCing high-risk changes
> > > > > like this?
> > > > 
> > > > Since there were very few bug reports, it was not deemed high risk.
> > > > 
> > > > In any case, it would be good to get the bottom of what is causing the
> > > > problem, so is there any way you can isolate which code seems to be
> > > > going "bad"?
> > > > 
> > > > For example, if this problem affects code in sys/dev/iommu, is there
> > > > some way you can compile that part with -O1, or with an older version
> > > > of clang (from ports), to see if the problem goes away?
> > > 
> > > I did try removing all custom make.conf settings (previously I just had
> > > CPUTYPE?=icelake-server), but that didn't change the behavior.
> > > 
> > > Before I try further build tweaks, I'd like to ask if the IOMMU fault
> > > report can provide guidance here?  AFAICT all the faults I'm getting
> > > show "reason 0x3".  If I'm reading the VT-d spec correctly, FR=0x3
> > > indicates an invalid context entry, in other words there's something the
> > > hardware doesn't like in the way the address width or pagetable base is
> > > configured for the PCIe requestor.
> > 
> > I would start looking at the other direction: might be, there are still some
> > left shifts for int32 values with the shift count > 30, or uint32 with the
> > count > 31.
> > 
> > Also might be useful to dump each context entry on creation, it is kept
> > constant after.
> 
> I did look over the constants in intel_reg.h, and didn't see anything
> that looked as though it would be susceptible to sign-extension or
> truncation bugs.  In the failing case it's much easier for me to catch
> the fault messages than any initialization message, so I instrumented
> the fault handler to get the context entry from the dmar_ctx object
> using the same logic as dmar_map_ctx_entry(), and then dump out the ctx1
> and ctx2 fields.  What I see are messages like:
> 
> ... ctx1 0x10013b001 ctx2 0x103
> 
> At first glance these "look right": the P bit is set in ctx1, and the
> rest of the field looks like a valid physical address.  ctx2 also
> doesn't have any of the reserved bits set, but in all cases it does have
> AW=3, which would indicate 57-bit AGAW.  But when I boot the last
> working kernel, from the revision prior to the llvm15 MFC, I see this in
> dmesg:
> 
> ahci0: dmar4 pci0:0:17:5 rid 8d domain 1 mgaw 48 agaw 48 re-mapped
> 
> ...all reported devices show 48-bit MGAW/AGAW, so I would expect ctx2 to
> have AW=2.  I suspect this may be the source of the fault, but I'm not
> sure how it's getting configured that way, whether it's an issue with
> reading the capability register or something else.
> 

I can confirm that hacking domain_set_agaw() to always use the settings
from sagaw_bits[2] eliminates the faults and at least allows the machine
to boot to single-user mode.

Reply via email to