On Fri, Feb 25, 2022 at 2:30 AM Michael S. Tsirkin <m...@redhat.com> wrote: > > On Thu, Feb 24, 2022 at 05:54:58PM +0000, Joao Martins wrote: > > On 2/24/22 17:23, Michael S. Tsirkin wrote: > > > On Thu, Feb 24, 2022 at 04:07:22PM +0000, Joao Martins wrote: > > >> On 2/23/22 23:35, Joao Martins wrote: > > >>> On 2/23/22 21:22, Michael S. Tsirkin wrote: > > >>>>> +static void x86_update_above_4g_mem_start(PCMachineState *pcms, > > >>>>> + uint64_t pci_hole64_size) > > >>>>> +{ > > >>>>> + X86MachineState *x86ms = X86_MACHINE(pcms); > > >>>>> + uint32_t eax, vendor[3]; > > >>>>> + > > >>>>> + host_cpuid(0x0, 0, &eax, &vendor[0], &vendor[2], &vendor[1]); > > >>>>> + if (!IS_AMD_VENDOR(vendor)) { > > >>>>> + return; > > >>>>> + } > > >>>> > > >>>> Wait a sec, should this actually be tying things to the host CPU ID? > > >>>> It's really about what we present to the guest though, > > >>>> isn't it? > > >>> > > >>> It was the easier catch all to use cpuid without going into > > >>> Linux UAPI specifics. But it doesn't have to tie in there, it is only > > >>> for systems with an IOMMU present. > > >>> > > >>>> Also, can't we tie this to whether the AMD IOMMU is present? > > >>>> > > >>> I think so, I can add that. Something like a amd_iommu_exists() helper > > >>> in util/vfio-helpers.c which checks if there's any sysfs child entries > > >>> that start with ivhd in /sys/class/iommu/. Given that this HT region is > > >>> hardcoded in iommu reserved regions since >=4.11 (to latest) I don't > > >>> think it's > > >>> even worth checking the range exists in: > > >>> > > >>> /sys/kernel/iommu_groups/0/reserved_regions > > >>> > > >>> (Also that sysfs ABI is >= 4.11 only) > > >> > > >> Here's what I have staged in local tree, to address your comment. > > >> > > >> Naturally the first chunk is what's affected by this patch the rest is a > > >> precedessor patch to introduce qemu_amd_iommu_is_present(). Seems to pass > > >> all the tests and what not. > > >> > > >> I am not entirely sure this is the right place to put such a helper, open > > >> to suggestions. wrt to the naming of the helper, I tried to follow the > > >> rest > > >> of the file's style. > > >> > > >> diff --git a/hw/i386/pc.c b/hw/i386/pc.c > > >> index a9be5d33a291..2ea4430d5dcc 100644 > > >> --- a/hw/i386/pc.c > > >> +++ b/hw/i386/pc.c > > >> @@ -868,10 +868,8 @@ static void > > >> x86_update_above_4g_mem_start(PCMachineState *pcms, > > >> uint64_t pci_hole64_size) > > >> { > > >> X86MachineState *x86ms = X86_MACHINE(pcms); > > >> - uint32_t eax, vendor[3]; > > >> > > >> - host_cpuid(0x0, 0, &eax, &vendor[0], &vendor[2], &vendor[1]); > > >> - if (!IS_AMD_VENDOR(vendor)) { > > >> + if (!qemu_amd_iommu_is_present()) { > > >> return; > > >> } > > >> > > >> diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h > > >> index 7bcce3bceb0f..eb4ea071ecec 100644 > > >> --- a/include/qemu/osdep.h > > >> +++ b/include/qemu/osdep.h > > >> @@ -637,6 +637,15 @@ char *qemu_get_host_name(Error **errp); > > >> */ > > >> size_t qemu_get_host_physmem(void); > > >> > > >> +/** > > >> + * qemu_amd_iommu_is_present: > > >> + * > > >> + * Operating system agnostic way of querying if an AMD IOMMU > > >> + * is present. > > >> + * > > >> + */ > > >> +bool qemu_amd_iommu_is_present(void); > > >> + > > >> /* > > >> * Toggle write/execute on the pages marked MAP_JIT > > >> * for the current thread. > > >> diff --git a/util/oslib-posix.c b/util/oslib-posix.c > > >> index f2be7321c59f..54cef21217c4 100644 > > >> --- a/util/oslib-posix.c > > >> +++ b/util/oslib-posix.c > > >> @@ -982,3 +982,32 @@ size_t qemu_get_host_physmem(void) > > >> #endif > > >> return 0; > > >> } > > >> + > > >> +bool qemu_amd_iommu_is_present(void) > > >> +{ > > >> + bool found = false; > > >> +#ifdef CONFIG_LINUX > > >> + struct dirent *entry; > > >> + char *path; > > >> + DIR *dir; > > >> + > > >> + path = g_strdup_printf("/sys/class/iommu"); > > >> + dir = opendir(path); > > >> + if (!dir) { > > >> + g_free(path); > > >> + return found; > > >> + } > > >> + > > >> + do { > > >> + entry = readdir(dir); > > >> + if (entry && !strncmp(entry->d_name, "ivhd", 4)) { > > >> + found = true; > > >> + break; > > >> + } > > >> + } while (entry); > > >> + > > >> + g_free(path); > > >> + closedir(dir); > > >> +#endif > > >> + return found; > > >> +} > > > > > > why are we checking whether an AMD IOMMU is present > > > on the host? > > > Isn't what we care about whether it's > > > present in the VM? What we are reserving after all > > > is a range of GPAs, not actual host PA's ... > > > > > Right. > > > > But any DMA map done by VFIO/IOMMU using those GPA ranges will fail, > > and so we need to not map that portion of address space entirely > > and use some other portion of the GPA-space. This has to > > do with host IOMMU which is where the DMA mapping of guest PA takes > > place. So, if you do not have an host IOMMU, you can't > > service guest DMA/PCIe services via VFIO through the host IOMMU, therefore > > you > > don't need this problem. > > > > Whether one uses a guest IOMMU or not does not affect the result, > > it would be more of a case of mimicking real hardware not fixing > > the issue of this series. > > > Hmm okay ... my first reaction was to say let's put it under VFIO then. > And ideally move the logic reporting reserved ranges there too. > However, I think vdpa has the same issue too. > CC Jason for an opinion.
vDPA is even more complicated than VFIO as it allows the device to reserve specific IOVA ranges: static void vhost_vdpa_set_iova_range(struct vhost_vdpa *v) { struct vdpa_iova_range *range = &v->range; struct vdpa_device *vdpa = v->vdpa; const struct vdpa_config_ops *ops = vdpa->config; if (ops->get_iova_range) { *range = ops->get_iova_range(vdpa); ... } I wonder if we need to expose those via netlink protocol. Thanks > Also, some concerns > - I think this changes memory layout for working VMs not using VFIO. > Better preserve the old layout for old machine types? > > - You mention the phys bits issue very briefly, and it's pretty > concerning. Do we maybe want to also disable the work around if phys > bits is too small? Also, we'd need to test a bunch of old > guests to see what happens. Which guests were tested? > > -- > MST >