On Thu, May 03, 2018 at 05:53:59PM +0800, Peter Xu wrote: > On Thu, May 03, 2018 at 05:22:03PM +0800, Jason Wang wrote: > > > > > > On 2018年05月03日 15:53, Peter Xu wrote: > > > On Thu, May 03, 2018 at 03:43:35PM +0800, Jason Wang wrote: > > > > > > > > On 2018年05月03日 15:28, Peter Xu wrote: > > > > > On Thu, May 03, 2018 at 03:20:11PM +0800, Jason Wang wrote: > > > > > > On 2018年05月03日 14:04, Peter Xu wrote: > > > > > > > IMHO the guest can't really detect this, but it'll found that the > > > > > > > device is not working functionally if it's doing something like > > > > > > > what > > > > > > > Jason has mentioned. > > > > > > > > > > > > > > Actually now I have had an idea if we really want to live well > > > > > > > even > > > > > > > with Jason's example: maybe we'll need to identify PSI/DSI. For > > > > > > > DSI, > > > > > > > we don't remap for mapped pages; for PSI, we unmap and remap the > > > > > > > mapped pages. That'll complicate the stuff a bit, but it should > > > > > > > satisfy all the people. > > > > > > > > > > > > > > Thanks, > > > > > > So it looks like there will be still unnecessary unamps. > > > > > Could I ask what do you mean by "unecessary unmaps"? > > > > It's for "for PSI, we unmap and remap the mapped pages". So for the > > > > first > > > > "unmap" how do you know it was really necessary without knowing the > > > > state of > > > > current shadow page table? > > > I don't. Could I just unmap it anyway? Say, now the guest _modified_ > > > the PTE already. Yes I think it's following the spec, but it is > > > really _unsafe_. We can know that from what it has done already. > > > Then I really think a unmap+map would be good enough for us... After > > > all that behavior can cause DMA error even on real hardwares. It can > > > never tell. > > > > I mean for following case: > > > > 1) guest maps A1 (iova) to XXX > > 2) guest maps A2 (A1 + 4K) (iova) to YYY > > 3) guest maps A3 (A1 + 8K) (iova) to ZZZ > > 4) guest unmaps A2 and A2, for reducing the number of PSIs, it can > > invalidate A1 with a range of 2M > > > > If this is allowed by spec, looks like A1 will be unmaped and mapped. > > My follow-up patch won't survive with this one but the original patch > will work. > > Jason and I discussed a bit on IRC on this matter. Here's the > conclusion we got: for now we use my original patch (which solves > everything except PTE modifications). We mark that modify-PTE problem > as TODO. Then at least we can have the nested device assignment work > well on known OSs first.
Here just to mention that we actually have no way to emulate a PTE modification procedure. The problem is that we can never atomically modify a PTE on the host with Linux, either via VFIO interface or even directly using IOMMU API in kernel. To be more specific to our use case - VFIO provides VFIO_IOMMU_MAP_DMA and VFIO_IOMMU_UNMAP_DMA, but it never provides VFIO_IOMMU_MODIFY_DMA to modify a PTE atomically. It means that even if we know the PTE has changed, then we can only unmap it and remap. It'll still have the same "invalid window" problem we have discussed since during unmap and remap the page is invalid (while from guest POV it should never, since the PTE modification is atomic). -- Peter Xu