On 16/02/2020 10:11 pm, Jerry Snitselaar wrote:
On Fri Feb 14 20, Robin Murphy wrote:
Hi Jerry,
On 2020-02-14 8:13 pm, Jerry Snitselaar wrote:
Hi Will,
On a gigabyte system with Cavium CN8xx, when doing a fio test against
an nvme drive we are seeing the following:
[ 637.161194] arm-smmu arm-smmu.1.auto: Unhandled context fault:
fsr=0x80000402, iova=0x8010003f6000, fsynr=0x70091, cbfrsynra=0x9000,
cb=7
[ 637.174329] arm-smmu arm-smmu.1.auto: Unhandled context fault:
fsr=0x80000402, iova=0x801000036000, fsynr=0x70091, cbfrsynra=0x9000,
cb=7
[ 637.186887] arm-smmu arm-smmu.1.auto: Unhandled context fault:
fsr=0x80000402, iova=0x8010002ee000, fsynr=0x70091, cbfrsynra=0x9000,
cb=7
[ 637.199275] arm-smmu arm-smmu.1.auto: Unhandled context fault:
fsr=0x80000402, iova=0x8010003c7000, fsynr=0x70091, cbfrsynra=0x9000,
cb=7
[ 637.211885] arm-smmu arm-smmu.1.auto: Unhandled context fault:
fsr=0x80000402, iova=0x801000392000, fsynr=0x70091, cbfrsynra=0x9000,
cb=7
[ 637.224580] arm-smmu arm-smmu.1.auto: Unhandled context fault:
fsr=0x80000402, iova=0x801000018000, fsynr=0x70091, cbfrsynra=0x9000,
cb=7
[ 637.237241] arm-smmu arm-smmu.1.auto: Unhandled context fault:
fsr=0x80000402, iova=0x801000360000, fsynr=0x70091, cbfrsynra=0x9000,
cb=7
[ 637.249657] arm-smmu arm-smmu.1.auto: Unhandled context fault:
fsr=0x80000402, iova=0x8010000ba000, fsynr=0x70091, cbfrsynra=0x9000,
cb=7
[ 637.262120] arm-smmu arm-smmu.1.auto: Unhandled context fault:
fsr=0x80000402, iova=0x80100003e000, fsynr=0x70091, cbfrsynra=0x9000,
cb=7
[ 637.274468] arm-smmu arm-smmu.1.auto: Unhandled context fault:
fsr=0x80000402, iova=0x801000304000, fsynr=0x70091, cbfrsynra=0x9000,
cb=7
Those "IOVAs" don't look much like IOVAs from the DMA allocator - if
they were physical addresses, would they correspond to an expected
region of the physical memory map?
I would suspect that this is most likely misbehaviour in the NVMe
driver (issuing a write to a non-DMA-mapped address), and the SMMU is
just doing its job in blocking and reporting it.
I also reproduced with 5.5-rc7, and will check 5.6-rc1 later today. I
couldn't narrow it down further into 5.4-rc1.
I don't know smmu or the code well, any thoughts on where to start
digging into this?
fio test that is being run is:
#fio -filename=/dev/nvme0n1 -iodepth=64 -thread -rw=randwrite
-ioengine=libaio -bs=4k -runtime=43200 -size=-group_reporting
-name=mytest -numjobs=32
Just to clarify, do other tests work OK on the same device?
Thanks,
Robin.
I was able to get back on the system today. I think I know what the
problem is:
[ 0.036189] iommu: Gigabyte R120-T34-00 detected, force iommu
passthrough mode
[ 6.324282] iommu: Default domain type: Translated
So the new default domain code in 5.4 overrides the iommu quirk code
setting default
passthrough. Testing a quick patch that tracks whether the default
domain was set
in the quirk code, and leaves it alone if it was. So far it seems to be
working.
Ah, OK. Could you point me at that quirk code? I can't seem to track it
down in mainline, and seeing this much leaves me dubious that it's even
correct - matching a particular board implies that it's a firmware issue
(as far as I'm aware the SMMUs in CN88xx SoCs are usable in general),
but if the firmware description is wrong to the point that DMA ops
translation doesn't work, then no other translation (e.g. VFIO) is
likely to work either. In that case it's simply not safe to enable the
SMMU at all, and fudging the default domain type merely hides one
symptom of the problem.
Robin.
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu