Re: Design session notes: GPU acceleration in Xen

Demi Marie Obenour Mon, 17 Jun 2024 08:08:04 -0700

On Mon, Jun 17, 2024 at 09:46:29AM +0200, Roger Pau Monné wrote:
> On Sun, Jun 16, 2024 at 08:38:19PM -0400, Demi Marie Obenour wrote:
> > On Fri, Jun 14, 2024 at 10:39:37AM +0200, Roger Pau Monné wrote:
> > > On Fri, Jun 14, 2024 at 10:12:40AM +0200, Jan Beulich wrote:
> > > > On 14.06.2024 09:21, Roger Pau Monné wrote:
> > > > > On Fri, Jun 14, 2024 at 08:38:51AM +0200, Jan Beulich wrote:
> > > > >> On 13.06.2024 20:43, Demi Marie Obenour wrote:
> > > > >>> GPU acceleration requires that pageable host memory be able to be 
> > > > >>> mapped
> > > > >>> into a guest.
> > > > >>
> > > > >> I'm sure it was explained in the session, which sadly I couldn't 
> > > > >> attend.
> > > > >> I've been asking Ray and Xenia the same before, but I'm afraid it 
> > > > >> still
> > > > >> hasn't become clear to me why this is a _requirement_. After all 
> > > > >> that's
> > > > >> against what we're doing elsewhere (i.e. so far it has always been
> > > > >> guest memory that's mapped in the host). I can appreciate that it 
> > > > >> might
> > > > >> be more difficult to implement, but avoiding to violate this 
> > > > >> fundamental
> > > > >> (kind of) rule might be worth the price (and would avoid other
> > > > >> complexities, of which there may be lurking more than what you 
> > > > >> enumerate
> > > > >> below).
> > > > > 
> > > > > My limited understanding (please someone correct me if wrong) is that
> > > > > the GPU buffer (or context I think it's also called?) is always
> > > > > allocated from dom0 (the owner of the GPU).  The underling memory
> > > > > addresses of such buffer needs to be mapped into the guest.  The
> > > > > buffer backing memory might be GPU MMIO from the device BAR(s) or
> > > > > system RAM, and such buffer can be paged by the dom0 kernel at any
> > > > > time (iow: changing the backing memory from MMIO to RAM or vice
> > > > > versa).  Also, the buffer must be contiguous in physical address
> > > > > space.
> > > > 
> > > > This last one in particular would of course be a severe restriction.
> > > > Yet: There's an IOMMU involved, isn't there?
> > > 
> > > Yup, IIRC that's why Ray said it was much more easier for them to
> > > support VirtIO GPUs from a PVH dom0 rather than classic PV one.
> > > 
> > > It might be easier to implement from a classic PV dom0 if there's
> > > pv-iommu support, so that dom0 can create it's own contiguous memory
> > > buffers from the device PoV.
> > 
> > What makes PVH an improvement here?  I thought PV dom0 uses an identity
> > mapping for the IOMMU, while a PVH dom0 uses an IOMMU that mirrors the
> > dom0 second-stage page tables.
> 
> Indeed, hence finding a physically contiguous buffer on classic PV is
> way more complicated, because the IOMMU identity maps mfns, and the PV
> address space can be completely scattered.
> 
> OTOH, on PVH the IOMMU page tables are the same as the second stage
> translation, and hence the physical address is way more compact (as it
> would be on native).


Ah, _that_ is what I missed.  I didn't realize that the physical address
space of PV guests was so scattered.

> > In both cases, the device physical
> > addresses are identical to dom0’s physical addresses.
> 
> Yes, but a PV dom0 physical address space can be very scattered.
> 
> IIRC there's an hypercall to request physically contiguous memory for
> PV, but you don't want to be using that every time you allocate a
> buffer (not sure it would support the sizes needed by the GPU
> anyway).

That makes sense, thanks!

> > PV is terrible for many reasons, so I’m okay with focusing on PVH dom0,
> > but I’d like to know why there is a difference.
> > 
> > > > > I'm not sure it's possible to ensure that when using system RAM such
> > > > > memory comes from the guest rather than the host, as it would likely
> > > > > require some very intrusive hooks into the kernel logic, and
> > > > > negotiation with the guest to allocate the requested amount of
> > > > > memory and hand it over to dom0.  If the maximum size of the buffer is
> > > > > known in advance maybe dom0 can negotiate with the guest to allocate
> > > > > such a region and grant it access to dom0 at driver attachment time.
> > > > 
> > > > Besides the thought of transiently converting RAM to kind-of-MMIO, this
> > > 
> > > As a note here, changing the type to MMIO would likely involve
> > > modifying the EPT/NPT tables to propagate the new type.  On a PVH dom0
> > > this would likely involve shattering superpages in order to set the
> > > correct memory types.
> > > 
> > > Depending on how often and how random those system RAM changes are
> > > necessary this could also create contention on the p2m lock.
> > > 
> > > > makes me think of another possible option: Could Dom0 transfer ownership
> > > > of the RAM that wants mapping in the guest (remotely resembling
> > > > grant-transfer)? Would require the guest to have ballooned down enough
> > > > first, of course. (In both cases it would certainly need working out how
> > > > the conversion / transfer back could be made work safely and reasonably
> > > > cleanly.)
> > > 
> > > Maybe.  The fact the guest needs to balloon down that amount of memory
> > > seems weird to me, as from the guest PoV that mapped memory is
> > > MMIO-like and not system RAM.
> > 
> > I don’t like it either.  Furthermore, this would require changes to the
> > virtio-GPU driver in the guest, which I’d prefer to avoid.
> 
> IMO it would be helpful if you (or someone) could write the full
> specification of how VirtIO GPU is supposed to work right now (with
> the KVM model I assume?) as it would be a good starting point to
> provide suggestions about how to make it work (or adapt it) on Xen.
> 
> I don't think the high level layers on top of VirtIO GPU are relevant,
> but it's important to understand the protocol between the VirtIO GPU
> front and back ends.

virtio-GPU is part of the OASIS VirtIO standard [1].

[1]: https://docs.oasis-open.org/virtio/virtio/v1.3/virtio-v1.3.html

> So far I only had scattered conversation about what's needed, but not
> a formal write-up of how this is supposed to work.

My understanding is that mapping GPU buffers into guests ("blob
resources" in virtio-GPU terms) is the only part of virtio-GPU that
didn't just work.  Furthermore, any solution that uses Linux's
kernel-mode GPU driver on the host will have the same requirements.
I don't consider writing a bespoke GPU driver that uses caller-allocated
buffers to be a reasonable solution that can support many GPU models.
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab

signature.asc
Description: PGP signature

Re: Design session notes: GPU acceleration in Xen

Reply via email to