On Mon, Jun 17, 2024 at 09:46:29AM +0200, Roger Pau Monné wrote: > On Sun, Jun 16, 2024 at 08:38:19PM -0400, Demi Marie Obenour wrote: > > On Fri, Jun 14, 2024 at 10:39:37AM +0200, Roger Pau Monné wrote: > > > On Fri, Jun 14, 2024 at 10:12:40AM +0200, Jan Beulich wrote: > > > > On 14.06.2024 09:21, Roger Pau Monné wrote: > > > > > On Fri, Jun 14, 2024 at 08:38:51AM +0200, Jan Beulich wrote: > > > > >> On 13.06.2024 20:43, Demi Marie Obenour wrote: > > > > >>> GPU acceleration requires that pageable host memory be able to be > > > > >>> mapped > > > > >>> into a guest. > > > > >> > > > > >> I'm sure it was explained in the session, which sadly I couldn't > > > > >> attend. > > > > >> I've been asking Ray and Xenia the same before, but I'm afraid it > > > > >> still > > > > >> hasn't become clear to me why this is a _requirement_. After all > > > > >> that's > > > > >> against what we're doing elsewhere (i.e. so far it has always been > > > > >> guest memory that's mapped in the host). I can appreciate that it > > > > >> might > > > > >> be more difficult to implement, but avoiding to violate this > > > > >> fundamental > > > > >> (kind of) rule might be worth the price (and would avoid other > > > > >> complexities, of which there may be lurking more than what you > > > > >> enumerate > > > > >> below). > > > > > > > > > > My limited understanding (please someone correct me if wrong) is that > > > > > the GPU buffer (or context I think it's also called?) is always > > > > > allocated from dom0 (the owner of the GPU). The underling memory > > > > > addresses of such buffer needs to be mapped into the guest. The > > > > > buffer backing memory might be GPU MMIO from the device BAR(s) or > > > > > system RAM, and such buffer can be paged by the dom0 kernel at any > > > > > time (iow: changing the backing memory from MMIO to RAM or vice > > > > > versa). Also, the buffer must be contiguous in physical address > > > > > space. > > > > > > > > This last one in particular would of course be a severe restriction. > > > > Yet: There's an IOMMU involved, isn't there? > > > > > > Yup, IIRC that's why Ray said it was much more easier for them to > > > support VirtIO GPUs from a PVH dom0 rather than classic PV one. > > > > > > It might be easier to implement from a classic PV dom0 if there's > > > pv-iommu support, so that dom0 can create it's own contiguous memory > > > buffers from the device PoV. > > > > What makes PVH an improvement here? I thought PV dom0 uses an identity > > mapping for the IOMMU, while a PVH dom0 uses an IOMMU that mirrors the > > dom0 second-stage page tables. > > Indeed, hence finding a physically contiguous buffer on classic PV is > way more complicated, because the IOMMU identity maps mfns, and the PV > address space can be completely scattered. > > OTOH, on PVH the IOMMU page tables are the same as the second stage > translation, and hence the physical address is way more compact (as it > would be on native).
Ah, _that_ is what I missed. I didn't realize that the physical address space of PV guests was so scattered. > > In both cases, the device physical > > addresses are identical to dom0’s physical addresses. > > Yes, but a PV dom0 physical address space can be very scattered. > > IIRC there's an hypercall to request physically contiguous memory for > PV, but you don't want to be using that every time you allocate a > buffer (not sure it would support the sizes needed by the GPU > anyway). That makes sense, thanks! > > PV is terrible for many reasons, so I’m okay with focusing on PVH dom0, > > but I’d like to know why there is a difference. > > > > > > > I'm not sure it's possible to ensure that when using system RAM such > > > > > memory comes from the guest rather than the host, as it would likely > > > > > require some very intrusive hooks into the kernel logic, and > > > > > negotiation with the guest to allocate the requested amount of > > > > > memory and hand it over to dom0. If the maximum size of the buffer is > > > > > known in advance maybe dom0 can negotiate with the guest to allocate > > > > > such a region and grant it access to dom0 at driver attachment time. > > > > > > > > Besides the thought of transiently converting RAM to kind-of-MMIO, this > > > > > > As a note here, changing the type to MMIO would likely involve > > > modifying the EPT/NPT tables to propagate the new type. On a PVH dom0 > > > this would likely involve shattering superpages in order to set the > > > correct memory types. > > > > > > Depending on how often and how random those system RAM changes are > > > necessary this could also create contention on the p2m lock. > > > > > > > makes me think of another possible option: Could Dom0 transfer ownership > > > > of the RAM that wants mapping in the guest (remotely resembling > > > > grant-transfer)? Would require the guest to have ballooned down enough > > > > first, of course. (In both cases it would certainly need working out how > > > > the conversion / transfer back could be made work safely and reasonably > > > > cleanly.) > > > > > > Maybe. The fact the guest needs to balloon down that amount of memory > > > seems weird to me, as from the guest PoV that mapped memory is > > > MMIO-like and not system RAM. > > > > I don’t like it either. Furthermore, this would require changes to the > > virtio-GPU driver in the guest, which I’d prefer to avoid. > > IMO it would be helpful if you (or someone) could write the full > specification of how VirtIO GPU is supposed to work right now (with > the KVM model I assume?) as it would be a good starting point to > provide suggestions about how to make it work (or adapt it) on Xen. > > I don't think the high level layers on top of VirtIO GPU are relevant, > but it's important to understand the protocol between the VirtIO GPU > front and back ends. virtio-GPU is part of the OASIS VirtIO standard [1]. [1]: https://docs.oasis-open.org/virtio/virtio/v1.3/virtio-v1.3.html > So far I only had scattered conversation about what's needed, but not > a formal write-up of how this is supposed to work. My understanding is that mapping GPU buffers into guests ("blob resources" in virtio-GPU terms) is the only part of virtio-GPU that didn't just work. Furthermore, any solution that uses Linux's kernel-mode GPU driver on the host will have the same requirements. I don't consider writing a bespoke GPU driver that uses caller-allocated buffers to be a reasonable solution that can support many GPU models. -- Sincerely, Demi Marie Obenour (she/her/hers) Invisible Things Lab
signature.asc
Description: PGP signature