On 4/30/23 8:31 PM, Jason A. Harmening wrote:
On Sun, Apr 30, 2023 at 07:34:45PM -0500, Jason A. Harmening wrote:
On Sun, Apr 30, 2023 at 06:47:13PM -0500, Jason A. Harmening wrote:
On Sun, Apr 30, 2023 at 08:09:16AM +0300, Konstantin Belousov wrote:
On Sat, Apr 29, 2023 at 02:27:50PM -0500, Jason A. Harmening wrote:
On Sat, Apr 29, 2023 at 08:49:28PM +0200, Dimitry Andric wrote:
On 29 Apr 2023, at 20:33, Jason A. Harmening <j...@freebsd.org> wrote:

On Sun, Apr 09, 2023 at 09:35:22PM +0000, Dimitry Andric wrote:
The branch stable/13 has been updated by dim:

URL: 
https://cgit.FreeBSD.org/src/commit/?id=060699e9136975d51d3f726b9785bdbac9a62ba6

commit 060699e9136975d51d3f726b9785bdbac9a62ba6
Author:     Dimitry Andric <d...@freebsd.org>
AuthorDate: 2023-01-14 16:33:24 +0000
Commit:     Dimitry Andric <d...@freebsd.org>
CommitDate: 2023-04-09 14:54:52 +0000

    Merge llvm-project release/15.x llvmorg-15.0.7-0-g8dfdcc7b7bf6

    This updates llvm, clang, compiler-rt, libc++, libunwind, lld, lldb and
    openmp to llvmorg-15.0.7-0-g8dfdcc7b7bf6.

    PR:             265425
    MFC after:      2 weeks

This MFC of llvm15 appears to have completely broken the Intel IOMMU
driver on my stable/13 machine.  After this series of commits, any
downstream DMA seems to produce an IOMMU translation fault, which
renders the machine completely unusable: no nvme boot disk, no usb
keyboard, etc.

The faults I see look something like this:

DMAR4: ahci0: pci0:17:5 sid 8d fault acc 0 adt 0x0 reason 0x3 addr 26000

It's a bit surprising to see a toolchain upgrade produce breakage like
this, but that's what git bisect clearly tells me.  I wonder if some of
the IOMMU control structures might be defined as C bitfields and the new
compiler is emitting them differently?  Also, was any breakage like this
observed when -current was upgraded to llvm15 several months ago?

I haven't heard anything about such breakage, no.


More generally, this is the second time in as many months I've had to
deal with IOMMU breakage on -stable.  I can't imagine I'm the only
person who sees value in running with DMA remapping enabled; do we need
a dedicated DMAR-enabled machine in the cluster to smoke-test changes
like this?  More generally, should we avoid MFCing high-risk changes
like this?

Since there were very few bug reports, it was not deemed high risk.

In any case, it would be good to get the bottom of what is causing the
problem, so is there any way you can isolate which code seems to be
going "bad"?

For example, if this problem affects code in sys/dev/iommu, is there
some way you can compile that part with -O1, or with an older version
of clang (from ports), to see if the problem goes away?

I did try removing all custom make.conf settings (previously I just had
CPUTYPE?=icelake-server), but that didn't change the behavior.

Before I try further build tweaks, I'd like to ask if the IOMMU fault
report can provide guidance here?  AFAICT all the faults I'm getting
show "reason 0x3".  If I'm reading the VT-d spec correctly, FR=0x3
indicates an invalid context entry, in other words there's something the
hardware doesn't like in the way the address width or pagetable base is
configured for the PCIe requestor.

I would start looking at the other direction: might be, there are still some
left shifts for int32 values with the shift count > 30, or uint32 with the
count > 31.

Also might be useful to dump each context entry on creation, it is kept
constant after.

I did look over the constants in intel_reg.h, and didn't see anything
that looked as though it would be susceptible to sign-extension or
truncation bugs.  In the failing case it's much easier for me to catch
the fault messages than any initialization message, so I instrumented
the fault handler to get the context entry from the dmar_ctx object
using the same logic as dmar_map_ctx_entry(), and then dump out the ctx1
and ctx2 fields.  What I see are messages like:

... ctx1 0x10013b001 ctx2 0x103

At first glance these "look right": the P bit is set in ctx1, and the
rest of the field looks like a valid physical address.  ctx2 also
doesn't have any of the reserved bits set, but in all cases it does have
AW=3, which would indicate 57-bit AGAW.  But when I boot the last
working kernel, from the revision prior to the llvm15 MFC, I see this in
dmesg:

ahci0: dmar4 pci0:0:17:5 rid 8d domain 1 mgaw 48 agaw 48 re-mapped

...all reported devices show 48-bit MGAW/AGAW, so I would expect ctx2 to
have AW=2.  I suspect this may be the source of the fault, but I'm not
sure how it's getting configured that way, whether it's an issue with
reading the capability register or something else.


I can confirm that hacking domain_set_agaw() to always use the settings
from sagaw_bits[2] eliminates the faults and at least allows the machine
to boot to single-user mode.

I see what's happening now.  When I added the hack to always set
sagaw_bits[2], I noted that the passed-in MGAW was still 57, while
unit->hw_cap had the correct value of 0x4 (=> 4-level paging, 48-bit AW)
in bits 12:8.  The problem is that sagaw_bits has agaw=64 in its last
entry.  This results in dmar_maxaddr2mgaw() attempting a comparison
against 1ULL << 64 in the final iteration of its first loop.  I suspect
the new compiler probably determines that last iteration is meaningless
and simply omits it from the (probably unrolled) loop.  Since the "loop"
terminates with i < nitems(sagaw_bits), the subsequent "allow_less ..."
case doesn't execute and we end up erroneously selecting a 57-bit
address width.  Just commenting out that last entry in sagaw_bits fixes
the problem.

So, two questions:
1) Does any VT-d hardware actually support 6-level paging?  The ca. 2021
VT-d spec I'm looking at indicates 5-level is the greatest depth
supported, with everything above that being reserved.

2) I'd expect clang to try very hard to error out in a situation like
this, but I see that sys/conf/kern.mk sets -Wno-shift-count-overflow
among other things, and more of them were added for clang 15.  This
seems like a really bad idea, regardless of how much of a PITA it may be
to fix these warnings.

FWIW, I've been working on trying to re-enable some of the warnings that
were disabled for clang 15 in main.  I'll move that one higher up on my
todo list.

--
John Baldwin


Reply via email to