On Mon, Jun 06, 2016 at 09:58:09PM -0600, Alex Williamson wrote: > On Tue, 7 Jun 2016 11:20:32 +0800 > Peter Xu <pet...@redhat.com> wrote: [...] > > Only if cap_caching_mode() is set (which is bit 7, the CM bit), we > > will send these invalidations. What I meant is that, we should allow > > user to specify the CM bit, so that when we are not using VFIO > > devices, we can skip the above flush_content() and flush_iotlb() > > etc... So, besides the truth that we have some guests do not support > > CM bit (like Jailhouse), performance might be another consideration > > point that we should allow user to specify the CM bit themselfs. > > I'm dubious of this, IOMMU drivers are already aware that hardware > flushes are expensive and do batching to optimize it. The queued > invalidation mechanism itself is meant to allow asynchronous > invalidations. QEMU invalidating a virtual IOMMU might very well be > faster than hardware.
Agree. However it seems that current Linux is still not taking this advantage... check qi_flush_context() and qi_flush_iotlb(). qi_submit_sync() is used for both, which sends one invalidation with a explicit wait to make sure it's sync. > > > > > > > C: Page-walk Coherency > > > This field indicates if hardware access to the root, context, > > > extended-context and interrupt-remap tables, and second-level paging > > > structures for requests-without PASID, are coherent (snooped) or not. > > > • 0: Indicates hardware accesses to remapping structures are > > > non-coherent. > > > • 1: Indicates hardware accesses to remapping structures are coherent. > > > > > > Without both CM=0 and C=0, our only virtualization mechanism for > > > maintaining a hardware cache coherent with the guest view of the iommu > > > would be to shadow all of the VT-d structures. For purely emulated > > > devices, maybe we can get away with that, but I doubt the current > > > ghashes used for the iotlb are prepared for it. > > > > Actually I haven't noticed this bit yet. I see that this will decide > > whether guest kernel need to send specific clflush() when modifying > > IOMMU PTEs, but shouldn't we flush the memory cache always so that we > > can sure IOMMU can see the same memory data as CPU does? > > I think it would be a question of how much the g_hash code really buys > us in the VT-d code, it might be faster to do a lookup each time if it > means fewer flushes. Those hashes are useless overhead for assigned > devices, so maybe we can avoid them when we only have assigned > devices ;) Thanks, Errr, I just noticed that VFIO devices do not need emulated cache. There are indeed lots of pending works TBD on vIOMMU side... Thanks! -- peterx