Another piece of information : The observations are same, if the current pci-device (sd/mmc controller) is detached, and another pci-device (sound controller) is attached to the guest.
So, it looks that we can rule out any (pci-)device-specific issue. For brevity, here are the details of the other pci-device I tried with : ############################################### sudo lspci -vvv 00:1b.0 Audio device: Intel Corporation 6 Series/C200 Series Chipset Family High Definition Audio Controller (rev 04) DeviceName: Onboard Audio Subsystem: Dell 6 Series/C200 Series Chipset Family High Definition Audio Controller Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 31 IOMMU group: 5 Region 0: Memory at e2e60000 (64-bit, non-prefetchable) [size=16K] Capabilities: [50] Power Management version 2 Flags: PMEClk- DSI- D1- D2- AuxCurrent=55mA PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- Capabilities: [60] MSI: Enable+ Count=1/1 Maskable- 64bit+ Address: 00000000fee00358 Data: 0000 Capabilities: [70] Express (v1) Root Complex Integrated Endpoint, MSI 00 DevCap: MaxPayload 128 bytes, PhantFunc 0 ExtTag- RBE- FLReset+ DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq- RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- FLReset- MaxPayload 128 bytes, MaxReadReq 128 bytes DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend- Capabilities: [100 v1] Virtual Channel Caps: LPEVC=0 RefClk=100ns PATEntryBits=1 Arb: Fixed- WRR32- WRR64- WRR128- Ctrl: ArbSelect=Fixed Status: InProgress- VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans- Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256- Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=01 Status: NegoPending- InProgress- VC1: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans- Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256- Ctrl: Enable+ ID=1 ArbSelect=Fixed TC/VC=22 Status: NegoPending- InProgress- Capabilities: [130 v1] Root Complex Link Desc: PortNumber=0f ComponentID=00 EltType=Config Link0: Desc: TargetPort=00 TargetComponent=00 AssocRCRB- LinkType=MemMapped LinkValid+ Addr: 00000000fed1c000 Kernel driver in use: snd_hda_intel Kernel modules: snd_hda_intel ############################################### On Fri, Oct 22, 2021 at 11:03 PM Ajay Garg <ajaygargn...@gmail.com> wrote: > > Ping .. > > Any updates please on this? > > It will be great to have the fix upstreamed (properly of course). > > Right now, the patch contains the change as suggested, of > explicitly/properly clearing out dma-mappings when unmap is called. > Please let me know in whatever way I can help, including > testing/debugging for other approaches if required. > > > Many thanks to Alex and Lu for their continued support on the issue. > > > > P.S. : > > I might have missed mentioning the information about the device that > causes flooding. > Please find it below : > > ###################################### > sudo lspci -vvv > > 0a:00.0 SD Host controller: O2 Micro, Inc. OZ600FJ0/OZ900FJ0/OZ600FJS > SD/MMC Card Reader Controller (rev 05) (prog-if 01) > Subsystem: Dell OZ600FJ0/OZ900FJ0/OZ600FJS SD/MMC Card Reader Controller > Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- > ParErr- Stepping- SERR- FastB2B- DisINTx- > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- > <TAbort- <MAbort- >SERR- <PERR- INTx- > Latency: 0, Cache Line Size: 64 bytes > Interrupt: pin A routed to IRQ 17 > IOMMU group: 14 > Region 0: Memory at e2c20000 (32-bit, non-prefetchable) [size=512] > Capabilities: [a0] Power Management version 3 > Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA > PME(D0+,D1+,D2+,D3hot+,D3cold+) > Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- > Capabilities: [48] MSI: Enable- Count=1/1 Maskable+ 64bit+ > Address: 0000000000000000 Data: 0000 > Masking: 00000000 Pending: 00000000 > Capabilities: [80] Express (v1) Endpoint, MSI 00 > DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <4us, L1 > <64us > ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- > SlotPowerLimit 10.000W > DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq- > RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ > MaxPayload 128 bytes, MaxReadReq 512 bytes > DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- > TransPend- > LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit > Latency L0s <512ns, L1 <64us > ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp- > LnkCtl: ASPM L0s Enabled; RCB 64 bytes, Disabled- CommClk- > ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt- > LnkSta: Speed 2.5GT/s (ok), Width x1 (ok) > TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt- > Capabilities: [100 v1] Virtual Channel > Caps: LPEVC=0 RefClk=100ns PATEntryBits=1 > Arb: Fixed- WRR32- WRR64- WRR128- > Ctrl: ArbSelect=Fixed > Status: InProgress- > VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans- > Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256- > Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=ff > Status: NegoPending- InProgress- > Capabilities: [200 v1] Advanced Error Reporting > UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- > RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- > UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- > RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- > UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- > RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- > CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr- > CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+ > AERCap: First Error Pointer: 00, ECRCGenCap- ECRCGenEn- > ECRCChkCap- ECRCChkEn- > MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap- > HeaderLog: 00000000 00000000 00000000 00000000 > Kernel driver in use: sdhci-pci > Kernel modules: sdhci_pci > ###################################### > > > > Thanks and Regards, > Ajay > > On Tue, Oct 12, 2021 at 7:27 PM Ajay Garg <ajaygargn...@gmail.com> wrote: > > > > Origins at : > > https://lists.linuxfoundation.org/pipermail/iommu/2021-October/thread.html > > > > === Changes from v1 => v2 === > > > > a) > > Improved patch-description. > > > > b) > > A more root-level fix, as suggested by > > > > 1. > > Alex Williamson <alex.william...@redhat.com> > > > > 2. > > Lu Baolu <baolu...@linux.intel.com> > > > > > > > > === Issue === > > > > Kernel-flooding is seen, when an x86_64 L1 guest (Ubuntu-21) is booted in > > qemu/kvm > > on a x86_64 host (Ubuntu-21), with a host-pci-device attached. > > > > Following kind of logs, along with the stacktraces, cause the flood : > > > > ...... > > DMAR: ERROR: DMA PTE for vPFN 0x428ec already set (to 3f6ec003 not > > 3f6ec003) > > DMAR: ERROR: DMA PTE for vPFN 0x428ed already set (to 3f6ed003 not > > 3f6ed003) > > DMAR: ERROR: DMA PTE for vPFN 0x428ee already set (to 3f6ee003 not > > 3f6ee003) > > DMAR: ERROR: DMA PTE for vPFN 0x428ef already set (to 3f6ef003 not > > 3f6ef003) > > DMAR: ERROR: DMA PTE for vPFN 0x428f0 already set (to 3f6f0003 not > > 3f6f0003) > > ...... > > > > > > > > === Current Behaviour, leading to the issue === > > > > Currently, when we do a dma-unmapping, we unmap/unlink the mappings, but > > the pte-entries are not cleared. > > > > Thus, following sequencing would flood the kernel-logs : > > > > i) > > A dma-unmapping makes the real/leaf-level pte-slot invalid, but the > > pte-content itself is not cleared. > > > > ii) > > Now, during some later dma-mapping procedure, as the pte-slot is about > > to hold a new pte-value, the intel-iommu checks if a prior > > pte-entry exists in the pte-slot. If it exists, it logs a kernel-error, > > along with a corresponding stacktrace. > > > > iii) > > Step ii) runs in abundance, and the kernel-logs run insane. > > > > > > > > === Fix === > > > > We ensure that as part of a dma-unmapping, each (unmapped) pte-slot > > is also cleared of its value/content (at the leaf-level, where the > > real mapping from a iova => pfn mapping is stored). > > > > This completes a "deep" dma-unmapping. > > > > > > > > Signed-off-by: Ajay Garg <ajaygargn...@gmail.com> > > --- > > drivers/iommu/intel/iommu.c | 2 ++ > > 1 file changed, 2 insertions(+) > > > > diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c > > index d75f59ae28e6..485a8ea71394 100644 > > --- a/drivers/iommu/intel/iommu.c > > +++ b/drivers/iommu/intel/iommu.c > > @@ -5090,6 +5090,8 @@ static size_t intel_iommu_unmap(struct iommu_domain > > *domain, > > gather->freelist = domain_unmap(dmar_domain, start_pfn, > > last_pfn, gather->freelist); > > > > + dma_pte_clear_range(dmar_domain, start_pfn, last_pfn); > > + > > if (dmar_domain->max_addr == iova + size) > > dmar_domain->max_addr = iova; > > > > -- > > 2.30.2 > > _______________________________________________ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu