Hi Nadav, > -----Original Message----- > From: Nadav Amit [mailto:nadav.a...@gmail.com] > Sent: Wednesday, March 17, 2021 1:46 PM > To: Longpeng (Mike, Cloud Infrastructure Service Product Dept.) > <longpe...@huawei.com> > Cc: David Woodhouse <dw...@infradead.org>; Lu Baolu > <baolu...@linux.intel.com>; Joerg Roedel <j...@8bytes.org>; w...@kernel.org; > alex.william...@redhat.com; chenjiashang <chenjiash...@huawei.com>; > io...@lists.linux-foundation.org; Gonglei (Arei) <arei.gong...@huawei.com>; > LKML <linux-kernel@vger.kernel.org> > Subject: Re: A problem of Intel IOMMU hardware ? > > > > > On Mar 16, 2021, at 8:16 PM, Longpeng (Mike, Cloud Infrastructure Service > Product Dept.) <longpe...@huawei.com> wrote: > > > > Hi guys, > > > > We find the Intel iommu cache (i.e. iotlb) maybe works wrong in a > > special situation, it would cause DMA fails or get wrong data. > > > > The reproducer (based on Alex's vfio testsuite[1]) is in attachment, > > it can reproduce the problem with high probability (~50%). > > I saw Lu replied, and he is much more knowledgable than I am (I was just > intrigued > by your email). > > However, if I were you I would try also to remove some “optimizations” to > look for > the root-cause (e.g., use domain specific invalidations instead of > page-specific). >
Good suggestion! But we did it these days, we tried to use global invalidations as follow: iommu->flush.flush_iotlb(iommu, did, 0, 0, DMA_TLB_DSI_FLUSH); But can not resolve the problem. > The first thing that comes to my mind is the invalidation hint (ih) in > iommu_flush_iotlb_psi(). I would remove it to see whether you get the failure > without it. We also notice the IH, but the IH is always ZERO in our case, as the spec says: ''' Paging-structure-cache entries caching second-level mappings associated with the specified domain-id and the second-level-input-address range are invalidated, if the Invalidation Hint (IH) field is Clear. ''' It seems the software is everything fine, so we've no choice but to suspect the hardware.