On Tue, Oct 8, 2024 at 2:14 AM Si-Wei Liu <si-wei....@oracle.com> wrote: > > > > On 10/7/2024 6:51 AM, Eugenio Perez Martin wrote: > > On Fri, Oct 4, 2024 at 8:48 PM Jonah Palmer <jonah.pal...@oracle.com> wrote: > >> > >> > >> On 10/4/24 11:17 AM, Eugenio Perez Martin wrote: > >>> On Fri, Oct 4, 2024 at 2:45 PM Jonah Palmer <jonah.pal...@oracle.com> > >>> wrote: > >>>> Implements the IOVA->GPA tree for handling mapping, unmapping, and > >>>> translations for guest memory regions. > >>>> > >>>> When the guest has overlapping memory regions, an HVA to IOVA translation > >>>> may return an incorrect IOVA when searching the IOVA->HVA tree. This is > >>>> due to one HVA range being contained (overlapping) in another HVA range > >>>> in the IOVA->HVA tree. By creating an IOVA->GPA tree, we can use GPAs to > >>>> translate and find the correct IOVA for guest memory regions. > >>>> > >>> Yes, this first patch is super close to what I meant, just one issue > >>> and a pair of nits here and there. > >>> > >>> I'd leave the second patch as an optimization on top, if the numbers > >>> prove that adding the code is worth it. > >>> > >> Ah okay, gotcha. I also wasn't sure if what you mentioned below on the > >> previous series you also wanted implemented or if these would also be > >> optimizations on top. > >> > >> [Adding code to the vhost_iova_tree layer for handling multiple buffers > >> returned from translation for the memory area where each iovec covers]: > >> ----------------------------------------------------------------------- > >> "Let's say that SVQ wants to translate the HVA range > >> 0xfeda0000-0xfedd0000. So it makes available for the device two > >> chained buffers: One with addr=0x1000 len=0x20000 and the other one > >> with addr=(0x20000c1000 len=0x10000). > >> > >> The VirtIO device should be able to translate these two buffers in > >> isolation and chain them. Not optimal but it helps to keep QEMU source > >> clean, as the device already must support it." > >> > > This is 100% in the device and QEMU is already able to split the > > buffers that way, so we don't need any change in QEMU. > Noted that if working with the full HVA tree directly, the internal iova > tree linear iterator iova_tree_find_address_iterator() today doesn't > guarantee the iova range returned can cover the entire length of the > iov, so things could happen like that the aliased range with smaller > size (than the requested) happens to be hit first in the linear search > and be returned, the fragmentation of which can't be guarded against by > the VirtIO device or the DMA API mentioned above. > > The second patch in this series kind of mitigated this side effect by > sorting out the backing ram_block with the help of > qemu_ram_block_from_host() in case of guest memory backed iov, so it > doesn't really count on vhost_iova_gpa_tree_find_iova() to find the > matching IOVA, but instead (somehow implicitly) avoids the fragmentation > side effect as mentioned above would never happen. Not saying I like the > way how it is implemented, but just wanted to point out the implication > if the second patch has to be removed - either add special handling code > to the iova-tree iterator sizing the range (same as how range selection > based upon permission will be done), or add special code in SVQ layer to > deal with fragmented IOVA ranges due to aliasing. >
This special code in SVQ is already there. And it will be needed even if we look for the buffers by GPA instead of by vaddr, the same way virtqueue_map_desc needs to handle it even if it works with GPA. Continuous GPA does not imply continuous vaddr.