On Wed, Aug 4, 2021 at 5:11 AM Peter Xu <pet...@redhat.com> wrote: > > On Tue, Aug 03, 2021 at 04:14:57PM +0800, Jason Wang wrote: > > > > 在 2021/8/3 下午1:51, Chao Gao 写道: > > > On Tue, Aug 03, 2021 at 12:43:58PM +0800, Jason Wang wrote: > > > > 在 2021/8/3 下午12:29, Chao Gao 写道: > > > > > Ping. Could someone help to review this patch? > > > > > > > > > > Thanks > > > > > Chao > > > > > > > > > > On Wed, Jul 21, 2021 at 03:54:02PM +0800, Chao Gao wrote: > > > > > > If guest enables IOMMU_PLATFORM for virtio-net, severe network > > > > > > performance drop is observed even if there is no IOMMU. > > > > > > > > We see such reports internally and we're testing a patch series to > > > > disable > > > > vhost IOTLB in this case. > > > > > > > > Will post a patch soon. > > [1] > > > > OK. put me in the CC list. I would like to test with TDX to ensure your > > > patch > > > fix the performance issue I am facing. > > > > > > Sure. > > > > > > > > > > > > > > > > > > > > > And disabling > > > > > > vhost can mitigate the perf issue. Finally, we found the culprit is > > > > > > frequent iotlb misses: kernel vhost-net has 2048 entries and each > > > > > > entry is 4K (qemu uses 4K for i386 if no IOMMU); vhost-net can cache > > > > > > translations for up to 8M (i.e. 4K*2048) IOVAs. If guest uses >8M > > > > > > memory for DMA, there are some iotlb misses. > > > > > > > > > > > > If there is no IOMMU or IOMMU is disabled or IOMMU works in > > > > > > pass-thru > > > > > > mode, we can optimistically use large, unaligned iotlb entries > > > > > > instead > > > > > > of 4K-aligned entries to reduce iotlb pressure. > > > > > > > > Instead of introducing new general facilities like unaligned IOTLB > > > > entry. I > > > > wonder if we optimize the vtd_iommu_translate() to use e.g 1G instead? > > > using 1G iotlb entry looks feasible. > > > > > > Want to send a patch? > > > > > > > > > > > } else { > > > > /* DMAR disabled, passthrough, use 4k-page*/ > > > > iotlb.iova = addr & VTD_PAGE_MASK_4K; > > > > iotlb.translated_addr = addr & VTD_PAGE_MASK_4K; > > > > iotlb.addr_mask = ~VTD_PAGE_MASK_4K; > > > > iotlb.perm = IOMMU_RW; > > > > success = true; > > > > } > > > > > > > > > > > > > > Actually, vhost-net > > > > > > in kernel supports unaligned iotlb entry. The alignment requirement > > > > > > is > > > > > > imposed by address_space_get_iotlb_entry() and > > > > > > flatview_do_translate(). > > > > > > > > For the passthrough case, is there anyway to detect them and then > > > > disable > > > > device IOTLB in those case? > > > yes. I guess so; qemu knows the presence and status of iommu. Currently, > > > in flatview_do_translate(), memory_region_get_iommu() tells whether a > > > memory > > > region is behind an iommu. > > > > > > The issues are: > > > > 1) how to know the passthrough mode is enabled (note that passthrough mode > > doesn't mean it doesn't sit behind IOMMU) > > memory_region_get_iommu() should return NULL if it's passthrough-ed?
Do you mean something like memory_region_get_iommu(as->root)? Could it be possible that the iommu was attached to a specified mr but not root. In [1], I originally try to use pci_device_iommu_address_space() in virtio_pci_get_dma_as(). But I suffer from the issue that virtio-pci might be initialized before the e.g intel-iommu, which you try to solve at [2]. Then I switch to introduce a iommu_enabled that compares the as returned by pci_device_iommu_address_space() against address_space_memory. And the iommu_enalbed will be called during vhost start where intel-iommu is guaranteed to be initialized. This seems to work. Let me post the patch and let's start there. > > > 2) can passthrough mode be disabled on the fly? If yes, we need to deal with > > them > > I don't think it happens in reality; e.g. when iommu=pt is set it's set until > the next guest reboot. However I don't know whether there's limitation from > spec-wise. Yes, that's what I worry about. Even if it's not limited by the Intel spec, we might suffer from this in another iommu. > Also I don't know whether there's special cases, for example when > we kexec. > > I've two questions.. > > Jason, when you mentioned the "fix" above [1], shouldn't that also fix the > same > issue, and in a better way? Because ideally I think if we know vhost does not > need a translation for either iommu_platform=off, or passthrough, dev-iotlb > layer seems an overhead with no real use. Yes, see above. Let me post the patch. > > The other question is I'm also wondering why we care about iommu_platform=on > when there's no vIOMMU at all - it's about why we bother setting the flag at > all? Or is it a valid use case? Encrypted VM like SEV or TDX. In those cases, swiotlb needs to be used in the guest since the swiotlb pages were not encrypted. And the iommu_platform=on is the only way to let the guest driver use DMA API (swiotlb). (The name iommu_platform is confusing and tricky) Thanks > > Thanks, > > -- > Peter Xu >