On Fri, Aug 30, 2024 at 3:52 PM Jonah Palmer <jonah.pal...@oracle.com> wrote: > > > > On 8/30/24 4:05 AM, Eugenio Perez Martin wrote: > > On Fri, Aug 30, 2024 at 6:20 AM Si-Wei Liu <si-wei....@oracle.com> wrote: > >> > >> > >> > >> On 8/29/2024 9:53 AM, Eugenio Perez Martin wrote: > >>> On Wed, Aug 21, 2024 at 2:56 PM Jonah Palmer <jonah.pal...@oracle.com> > >>> wrote: > >>>> Decouples the IOVA allocator from the IOVA->HVA tree and instead adds > >>>> the allocated IOVA range to an IOVA-only tree (iova_map). This IOVA tree > >>>> will hold all IOVA ranges that have been allocated (e.g. in the > >>>> IOVA->HVA tree) and are removed when any IOVA ranges are deallocated. > >>>> > >>>> A new API function vhost_iova_tree_insert() is also created to add a > >>>> IOVA->HVA mapping into the IOVA->HVA tree. > >>>> > >>> I think this is a good first iteration but we can take steps to > >>> simplify it. Also, it is great to be able to make points on real code > >>> instead of designs on the air :). > >>> > > I can add more comments in the code if this is what you mean, no problem! >
No action needed about this feedback :). I just meant that it will be easier to iterate on code than designing just by talking at this stage. > >>> I expected a split of vhost_iova_tree_map_alloc between the current > >>> vhost_iova_tree_map_alloc and vhost_iova_tree_map_alloc_gpa, or > >>> similar. Similarly, a vhost_iova_tree_remove and > >>> vhost_iova_tree_remove_gpa would be needed. > >>> >>> The first one is used for regions that don't exist in the guest, like > >>> SVQ vrings or CVQ buffers. The second one is the one used by the > >>> memory listener to map the guest regions into the vdpa device. > >>> > >>> Implementation wise, only two trees are actually needed: > >>> * Current iova_taddr_map that contains all IOVA->vaddr translations as > >>> seen by the device, so both allocation functions can work on a single > >>> tree. The function iova_tree_find_iova keeps using this one, so the > >> I thought we had thorough discussion about this and agreed upon the > >> decoupled IOVA allocator solution. > > > > My interpretation of it is to leave the allocator as the current one, > > and create a new tree with GPA which is guaranteed to be unique. But > > we can talk over it of course. > > > > So you mean keep the full IOVA->HVA tree but also have a GPA->IOVA tree > as well for guest memory regions, correct? > Right. > >> But maybe I missed something earlier, > >> I am not clear how come this iova_tree_find_iova function could still > >> work with the full IOVA-> HVA tree when it comes to aliased memory or > >> overlapped HVAs? Granted, for the memory map removal in the > >> .region_del() path, we could rely on the GPA tree to locate the > >> corresponding IOVA, but how come the translation path could figure out > >> which IOVA range to return when the vaddr happens to fall in an > >> overlapped HVA range? > > > > That is not a problem, as they both translate to the same address at the > > device. > > > > The most complicated situation is where we have a region contained in > > another region, and the requested buffer crosses them. If the IOVA > > tree returns the inner region, it will return the buffer chained with > > the rest of the content in the outer region. Not optimal, but solved > > either way. > > > > The only problem that comes to my mind is the case where the inner > > region is RO and it is a write command, but I don't think we have this > > case in a sane guest. A malicious guest cannot do any harm this way > > anyway. > > > >> Do we still assume some overlapping order so we > >> always return the first match from the tree? Or we expect every current > >> user of iova_tree_find_iova should pass in GPA rather than HVA and use > >> the vhost_iova_xxx_gpa API variant to look up IOVA? > >> > > > > No, iova_tree_find_iova should keep asking for vaddr, as the result is > > guaranteed to be there. Users of VhostIOVATree only need to modify how > > they add or remove regions, knowing if they come from the guest or > > not. As shown by this series, it is easier to do in that place than in > > translation. > > > >> Thanks, > >> -Siwei > >> > >>> user does not need to know if the address is from the guest or only > >>> exists in QEMU by using RAMBlock etc. All insert and remove functions > >>> use this tree. > >>> * A new tree that relates IOVA to GPA, that only > >>> vhost_iova_tree_map_alloc_gpa and vhost_iova_tree_remove_gpa uses. > >>> > >>> The ideal case is that the key in this new tree is the GPA and the > >>> value is the IOVA. But IOVATree's DMA is named the reverse: iova is > >>> the key and translated_addr is the vaddr. We can create a new tree > >>> struct for that, use GTree directly, or translate the reverse > >>> linearly. As memory add / remove should not be frequent, I think the > >>> simpler is the last one, but I'd be ok with creating a new tree. > >>> > > Is the concern here that making the gpa_map (GPA->IOVA tree) of type > IOVATree can be confusing for users from an API perspective? > > In other words, IOVATree users should always use its iova member as the > key and the translated_addr member as the value (and thus IOVATree > gpa_map feels out of place since it uses the translated_addr as the key > and iova as the value)? > Totally right. > Also, could you elaborate a bit more on "translate the reverse > linearly"? Do you mean to create an IOVA->GPA tree but always search the > tree using the GPA (e.g. via iova_tree_find_iova)? > Yes, that's the other option. I'm comparing the two options here: * IOVA->GPA has the advantage of reusing IOVATree, but lookups for GPA is O(N). * GPA->IOVA is more natural but we cannot reuse IOVATree in a simple way. > >>> vhost_iova_tree_map_alloc_gpa needs to add the map to this new tree > >>> also. Similarly, vhost_iova_tree_remove_gpa must look for the GPA in > >>> this tree, and only remove the associated DMAMap in iova_taddr_map > >>> that matches the IOVA. > >>> > >>> Does it make sense to you? > > Would using a name like vhost_iova_tree_map_alloc_gpa seem a bit > misleading given that we're already allocating the IOVA range in > vhost_iova_tree_map_alloc? With the vhost_iova_tree_map_alloc_gpa there is no need to call vhost_iova_tree_map_alloc. It's like merging the two operations, so the caller does not need to know the internals. > It seems this would be more of an insertion > rather than an allocation when adding a map to the GPA->IOVA tree. > Let's put it another way, why complicate IOVATree or VhostIOVATree by integrating the GPA->IOVA or the IOVA->GPA if we could simply store that information in the caller and make each struct simpler? If we abstract away the two trees under a coherent API, the struct makes sense. If not, it would be better to let the caller handle the information. > Also, are you saying that vhost_iova_tree_remove_gpa only removes the > DMAMap in the IOVA->HVA tree or should it also remove the corresponding > mapping in the GPA->IOVA tree? > It should remove it in both. > >>> > >>>> Signed-off-by: Jonah Palmer <jonah.pal...@oracle.com> > >>>> --- > >>>> hw/virtio/vhost-iova-tree.c | 38 ++++++++++++++++++++++++++++++++----- > >>>> hw/virtio/vhost-iova-tree.h | 1 + > >>>> hw/virtio/vhost-vdpa.c | 31 ++++++++++++++++++++++++------ > >>>> net/vhost-vdpa.c | 13 +++++++++++-- > >>>> 4 files changed, 70 insertions(+), 13 deletions(-) > >>>> > >>>> diff --git a/hw/virtio/vhost-iova-tree.c b/hw/virtio/vhost-iova-tree.c > >>>> index 3d03395a77..32c03db2f5 100644 > >>>> --- a/hw/virtio/vhost-iova-tree.c > >>>> +++ b/hw/virtio/vhost-iova-tree.c > >>>> @@ -28,12 +28,17 @@ struct VhostIOVATree { > >>>> > >>>> /* IOVA address to qemu memory maps. */ > >>>> IOVATree *iova_taddr_map; > >>>> + > >>>> + /* IOVA tree (IOVA allocator) */ > >>>> + IOVATree *iova_map; > >>>> }; > >>>> > >>>> /** > >>>> - * Create a new IOVA tree > >>>> + * Create a new VhostIOVATree with a new set of IOVATree's: > >>> s/IOVA tree/VhostIOVATree/ is good, but I think the rest is more an > >>> implementation detail. > >>> > > Gotcha. Would you like me to remove the other comments then? > Yes, it is better to remove the ones that expose internals. We should be able to change these kind of details without the users of the VhostIOVATree notice it. > >>>> + * - IOVA allocator (iova_map) > >>>> + * - IOVA->HVA tree (iova_taddr_map) > >>>> * > >>>> - * Returns the new IOVA tree > >>>> + * Returns the new VhostIOVATree > >>>> */ > >>>> VhostIOVATree *vhost_iova_tree_new(hwaddr iova_first, hwaddr > >>>> iova_last) > >>>> { > >>>> @@ -44,6 +49,7 @@ VhostIOVATree *vhost_iova_tree_new(hwaddr iova_first, > >>>> hwaddr iova_last) > >>>> tree->iova_last = iova_last; > >>>> > >>>> tree->iova_taddr_map = iova_tree_new(); > >>>> + tree->iova_map = iova_tree_new(); > >>>> return tree; > >>>> } > >>>> > >>>> @@ -53,6 +59,7 @@ VhostIOVATree *vhost_iova_tree_new(hwaddr iova_first, > >>>> hwaddr iova_last) > >>>> void vhost_iova_tree_delete(VhostIOVATree *iova_tree) > >>>> { > >>>> iova_tree_destroy(iova_tree->iova_taddr_map); > >>>> + iova_tree_destroy(iova_tree->iova_map); > >>>> g_free(iova_tree); > >>>> } > >>>> > >>>> @@ -88,13 +95,12 @@ int vhost_iova_tree_map_alloc(VhostIOVATree *tree, > >>>> DMAMap *map) > >>>> /* Some vhost devices do not like addr 0. Skip first page */ > >>>> hwaddr iova_first = tree->iova_first ?: > >>>> qemu_real_host_page_size(); > >>>> > >>>> - if (map->translated_addr + map->size < map->translated_addr || > >>> Why remove this condition? If the request is invalid we still need to > >>> return an error here. > >>> > >>> Maybe we should move it to iova_tree_alloc_map though. > >>> > > This series decoupled the IOVA allocator from also adding a mapping to > the IOVA->HVA tree and instead added IOVA ranges only to an IOVA-only > tree. So no value existed under translated_addr for these mappings > specifically. > > This check was moved to vhost_iova_tree_insert since that function > covered adding the host-only memory mappings to the IOVA->SVQ HVA tree. > Ok, I missed that then. Can you extract that in a separated patch? I think it makes sense by itself. > >>>> - map->perm == IOMMU_NONE) { > >>>> + if (map->perm == IOMMU_NONE) { > >>>> return IOVA_ERR_INVALID; > >>>> } > >>>> > >>>> /* Allocate a node in IOVA address */ > >>>> - return iova_tree_alloc_map(tree->iova_taddr_map, map, iova_first, > >>>> + return iova_tree_alloc_map(tree->iova_map, map, iova_first, > >>>> tree->iova_last); > >>>> } > >>>> > >>>> @@ -107,4 +113,26 @@ int vhost_iova_tree_map_alloc(VhostIOVATree *tree, > >>>> DMAMap *map) > >>>> void vhost_iova_tree_remove(VhostIOVATree *iova_tree, DMAMap map) > >>>> { > >>>> iova_tree_remove(iova_tree->iova_taddr_map, map); > >>>> + iova_tree_remove(iova_tree->iova_map, map); > >>>> +} > >>>> + > >>>> +/** > >>>> + * Insert a new mapping to the IOVA->HVA tree > >>>> + * > >>>> + * @tree: The VhostIOVATree > >>>> + * @map: The iova map > >>>> + * > >>>> + * Returns: > >>>> + * - IOVA_OK if the map fits in the container > >>>> + * - IOVA_ERR_INVALID if the map does not make sense (like size > >>>> overflow) > >>>> + * - IOVA_ERR_OVERLAP if the IOVA range overlaps with an existing range > >>>> + */ > >>>> +int vhost_iova_tree_insert(VhostIOVATree *iova_tree, DMAMap *map) > >>>> +{ > >>>> + if (map->translated_addr + map->size < map->translated_addr || > >>>> + map->perm == IOMMU_NONE) { > >>>> + return IOVA_ERR_INVALID; > >>>> + } > >>>> + > >>>> + return iova_tree_insert(iova_tree->iova_taddr_map, map); > >>>> } > >>>> diff --git a/hw/virtio/vhost-iova-tree.h b/hw/virtio/vhost-iova-tree.h > >>>> index 4adfd79ff0..8bf7b64786 100644 > >>>> --- a/hw/virtio/vhost-iova-tree.h > >>>> +++ b/hw/virtio/vhost-iova-tree.h > >>>> @@ -23,5 +23,6 @@ const DMAMap *vhost_iova_tree_find_iova(const > >>>> VhostIOVATree *iova_tree, > >>>> const DMAMap *map); > >>>> int vhost_iova_tree_map_alloc(VhostIOVATree *iova_tree, DMAMap *map); > >>>> void vhost_iova_tree_remove(VhostIOVATree *iova_tree, DMAMap map); > >>>> +int vhost_iova_tree_insert(VhostIOVATree *iova_tree, DMAMap *map); > >>>> > >>>> #endif > >>>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c > >>>> index 3cdaa12ed5..6702459065 100644 > >>>> --- a/hw/virtio/vhost-vdpa.c > >>>> +++ b/hw/virtio/vhost-vdpa.c > >>>> @@ -361,10 +361,10 @@ static void > >>>> vhost_vdpa_listener_region_add(MemoryListener *listener, > >>>> if (s->shadow_data) { > >>>> int r; > >>>> > >>>> - mem_region.translated_addr = (hwaddr)(uintptr_t)vaddr, > >>>> mem_region.size = int128_get64(llsize) - 1, > >>>> mem_region.perm = IOMMU_ACCESS_FLAG(true, section->readonly), > >>>> > >>>> + /* Allocate IOVA range and add the mapping to the IOVA tree */ > >>>> r = vhost_iova_tree_map_alloc(s->iova_tree, &mem_region); > >>>> if (unlikely(r != IOVA_OK)) { > >>>> error_report("Can't allocate a mapping (%d)", r); > >>>> @@ -372,6 +372,14 @@ static void > >>>> vhost_vdpa_listener_region_add(MemoryListener *listener, > >>>> } > >>>> > >>>> iova = mem_region.iova; > >>>> + > >>>> + /* Add mapping to the IOVA->HVA tree */ > >>>> + mem_region.translated_addr = (hwaddr)(uintptr_t)vaddr; > >>>> + r = vhost_iova_tree_insert(s->iova_tree, &mem_region); > >>>> + if (unlikely(r != IOVA_OK)) { > >>>> + error_report("Can't add listener region mapping (%d)", r); > >>>> + goto fail_map; > >>>> + } > >>> I'd say it is not intuitive for the caller code. > >>> > > Sorry, I'm not sure what you mean by this here. Would you mind > elaborating a bit more? > If VhostIOVATree gets reused it is easy to miss the vhost_iova_tree_insert after the allocation. If we're going to expose this second GPA tree, I think it would be better to place it directly in VhostShadowVirtqueue. All the conditionals fit better that way and we don't make VhostIOVATree / IOVATree harder to reuse. However, I keep thinking it is easy enough to hide in VhostIOVATree. > >>>> } > >>>> > >>>> vhost_vdpa_iotlb_batch_begin_once(s); > >>>> @@ -1142,19 +1150,30 @@ static void vhost_vdpa_svq_unmap_rings(struct > >>>> vhost_dev *dev, > >>>> * > >>>> * @v: Vhost-vdpa device > >>>> * @needle: The area to search iova > >>>> + * @taddr: The translated address (SVQ HVA) > >>>> * @errorp: Error pointer > >>>> */ > >>>> static bool vhost_vdpa_svq_map_ring(struct vhost_vdpa *v, DMAMap > >>>> *needle, > >>>> - Error **errp) > >>>> + hwaddr taddr, Error **errp) > >>>> { > >>>> int r; > >>>> > >>>> + /* Allocate IOVA range and add the mapping to the IOVA tree */ > >>>> r = vhost_iova_tree_map_alloc(v->shared->iova_tree, needle); > >>>> if (unlikely(r != IOVA_OK)) { > >>>> error_setg(errp, "Cannot allocate iova (%d)", r); > >>>> return false; > >>>> } > >>>> > >>>> + /* Add mapping to the IOVA->HVA tree */ > >>>> + needle->translated_addr = taddr; > >>>> + r = vhost_iova_tree_insert(v->shared->iova_tree, needle); > >>>> + if (unlikely(r != IOVA_OK)) { > >>>> + error_setg(errp, "Cannot add SVQ vring mapping (%d)", r); > >>>> + vhost_iova_tree_remove(v->shared->iova_tree, *needle); > >>>> + return false; > >>>> + } > >>>> + > >>>> r = vhost_vdpa_dma_map(v->shared, v->address_space_id, > >>>> needle->iova, > >>>> needle->size + 1, > >>>> (void *)(uintptr_t)needle->translated_addr, > >>>> @@ -1192,11 +1211,11 @@ static bool vhost_vdpa_svq_map_rings(struct > >>>> vhost_dev *dev, > >>>> vhost_svq_get_vring_addr(svq, &svq_addr); > >>>> > >>>> driver_region = (DMAMap) { > >>>> - .translated_addr = svq_addr.desc_user_addr, > >>>> .size = driver_size - 1, > >>>> .perm = IOMMU_RO, > >>>> }; > >>>> - ok = vhost_vdpa_svq_map_ring(v, &driver_region, errp); > >>>> + ok = vhost_vdpa_svq_map_ring(v, &driver_region, > >>>> svq_addr.desc_user_addr, > >>>> + errp); > >>>> if (unlikely(!ok)) { > >>>> error_prepend(errp, "Cannot create vq driver region: "); > >>>> return false; > >>>> @@ -1206,11 +1225,11 @@ static bool vhost_vdpa_svq_map_rings(struct > >>>> vhost_dev *dev, > >>>> addr->avail_user_addr = driver_region.iova + avail_offset; > >>>> > >>>> device_region = (DMAMap) { > >>>> - .translated_addr = svq_addr.used_user_addr, > >>>> .size = device_size - 1, > >>>> .perm = IOMMU_RW, > >>>> }; > >>>> - ok = vhost_vdpa_svq_map_ring(v, &device_region, errp); > >>>> + ok = vhost_vdpa_svq_map_ring(v, &device_region, > >>>> svq_addr.used_user_addr, > >>>> + errp); > >>>> if (unlikely(!ok)) { > >>>> error_prepend(errp, "Cannot create vq device region: "); > >>>> vhost_vdpa_svq_unmap_ring(v, driver_region.translated_addr); > >>>> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c > >>>> index 03457ead66..81da956b92 100644 > >>>> --- a/net/vhost-vdpa.c > >>>> +++ b/net/vhost-vdpa.c > >>>> @@ -512,15 +512,24 @@ static int vhost_vdpa_cvq_map_buf(struct > >>>> vhost_vdpa *v, void *buf, size_t size, > >>>> DMAMap map = {}; > >>>> int r; > >>>> > >>>> - map.translated_addr = (hwaddr)(uintptr_t)buf; > >>>> map.size = size - 1; > >>>> map.perm = write ? IOMMU_RW : IOMMU_RO, > >>>> + > >>>> + /* Allocate IOVA range and add the mapping to the IOVA tree */ > >>>> r = vhost_iova_tree_map_alloc(v->shared->iova_tree, &map); > >>>> if (unlikely(r != IOVA_OK)) { > >>>> - error_report("Cannot map injected element"); > >>>> + error_report("Cannot allocate IOVA range for injected element"); > >>>> return r; > >>>> } > >>>> > >>>> + /* Add mapping to the IOVA->HVA tree */ > >>>> + map.translated_addr = (hwaddr)(uintptr_t)buf; > >>>> + r = vhost_iova_tree_insert(v->shared->iova_tree, &map); > >>>> + if (unlikely(r != IOVA_OK)) { > >>>> + error_report("Cannot map injected element into IOVA->HVA tree"); > >>>> + goto dma_map_err; > >>>> + } > >>>> + > >>>> r = vhost_vdpa_dma_map(v->shared, v->address_space_id, map.iova, > >>>> vhost_vdpa_net_cvq_cmd_page_len(), buf, > >>>> !write); > >>>> if (unlikely(r < 0)) { > >>>> -- > >>>> 2.43.5 > >>>> > >> > > >