> From: Jason Wang [mailto:jasow...@redhat.com] > Sent: Tuesday, September 17, 2019 6:36 PM > > On 2019/9/17 下午4:48, Tian, Kevin wrote: > >> From: Jason Wang [mailto:jasow...@redhat.com] > >> Sent: Monday, September 16, 2019 4:33 PM > >> > >> > >> On 2019/9/16 上午9:51, Tian, Kevin wrote: > >>> Hi, Jason > >>> > >>> We had a discussion about dirty page tracking in VFIO, when vIOMMU > >>> is enabled: > >>> > >>> https://lists.nongnu.org/archive/html/qemu-devel/2019- > >> 09/msg02690.html > >>> It's actually a similar model as vhost - Qemu cannot interpose the fast- > >> path > >>> DMAs thus relies on the kernel part to track and report dirty page > >> information. > >>> Currently Qemu tracks dirty pages in GFN level, thus demanding a > >> translation > >>> from IOVA to GPA. Then the open in our discussion is where this > >> translation > >>> should happen. Doing the translation in kernel implies a device iotlb > >> flavor, > >>> which is what vhost implements today. It requires potentially large > >> tracking > >>> structures in the host kernel, but leveraging the existing log_sync flow > in > >> Qemu. > >>> On the other hand, Qemu may perform log_sync for every removal of > >> IOVA > >>> mapping and then do the translation itself, then avoiding the GPA > >> awareness > >>> in the kernel side. It needs some change to current Qemu log-sync flow, > >> and > >>> may bring more overhead if IOVA is frequently unmapped. > >>> > >>> So we'd like to hear about your opinions, especially about how you > came > >>> down to the current iotlb approach for vhost. > >> > >> We don't consider too much in the point when introducing vhost. And > >> before IOTLB, vhost has already know GPA through its mem table > >> (GPA->HVA). So it's nature and easier to track dirty pages at GPA level > >> then it won't any changes in the existing ABI. > > This is the same situation as VFIO. > > > >> For VFIO case, the only advantages of using GPA is that the log can then > >> be shared among all the devices that belongs to the VM. Otherwise > >> syncing through IOVA is cleaner. > > I still worry about the potential performance impact with this approach. > > In current mdev live migration series, there are multiple system calls > > involved when retrieving the dirty bitmap information for a given memory > > range. > > > I haven't took a deep look at that series. Technically dirty bitmap > could be shared between device and driver, then there's no system call > in synchronization.
That series require Qemu to tell the kernel about the information about queried region (start, number, and page_size), read the information about the dirty bitmap (offset, size) and then read the dirty bitmap. Although the bitmap can be mmaped thus shared, earlier reads/writes are conducted by pread/pwrite system calls. This design is fine for current log_dirty implementation, where dirty bitmap is synced in every pre-copy round. But to do it for every IOVA unmap, it's definitely over-killed. > > > > IOVA mappings might be changed frequently. Though one may > > argue that frequent IOVA change already has bad performance, it's still > > not good to introduce further non-negligible overhead in such situation. > > > Yes, it depends on the behavior of vIOMMU driver, e.g the frequency and > granularity of the flushing. > > > > > > On the other hand, I realized that adding IOVA awareness in VFIO is > > actually easy. Today VFIO already maintains a full list of IOVA and its > > associated HVA in vfio_dma structure, according to VFIO_MAP and > > VFIO_UNMAP. As long as we allow the latter two operations to accept > > another parameter (GPA), IOVA->GPA mapping can be naturally cached > > in existing vfio_dma objects. > > > Note that the HVA to GPA mapping is not an 1:1 mapping. One HVA range > could be mapped to several GPA ranges. This is fine. Currently vfio_dma maintains IOVA->HVA mapping. btw under what condition HVA->GPA is not 1:1 mapping? I didn't realize it. > > > > Those objects are always updated according > > to MAP and UNMAP ioctls to be up-to-date. Qemu then uniformly > > retrieves the VFIO dirty bitmap for the entire GPA range in every pre-copy > > round, regardless of whether vIOMMU is enabled. There is no need of > > another IOTLB implementation, with the main ask on a v2 MAP/UNMAP > > interface. > > > Or provide GPA to HVA mapping as vhost did. But a question is, I believe > device can only do dirty page logging through IOVA. So how do you handle > the case when IOVA is removed in this case? > That's why a log_sync is required each time when IOVA is unmapped, in Alex's thought. Thanks Kevin