On Fri, 6 Sept 2024 at 03:06, Albert Esteve <aest...@redhat.com> wrote: > On Thu, Sep 5, 2024 at 6:39 PM Stefan Hajnoczi <stefa...@redhat.com> wrote: >> >> On Tue, Sep 03, 2024 at 10:42:34AM +0200, Albert Esteve wrote: >> > Hello all, >> > >> > Sorry, I have been a bit disconnected from this thread as I was on >> > vacations and then had to switch tasks for a while. >> > >> > I will try to go through all comments and address them for the first >> > non-RFC drop of this patch series. >> > >> > But I was discussing with some colleagues on this. So turns out rust-vmm's >> > vhost-user-gpu will potentially use >> > this soon, and a rust-vmm/vhost patch have been already posted: >> > https://github.com/rust-vmm/vhost/pull/251. >> > So I think it may make sense to: >> > 1. Split the vhost-user documentation patch once settled. Since it is taken >> > as the official spec, >> > having it upstreamed independently of the implementation will benefit >> > other projects to >> > work/integrate their own code. >> > 2. Split READ_/WRITE_MEM messages from SHMEM_MAP/_UNMAP patches. >> > If I remember correctly, this addresses a virtio-fs specific issue, >> > that will not >> > impact either virtio-gpu nor virtio-media, or any other. >> >> This is an architectural issue that arises from exposing VIRTIO Shared >> Memory Regions in vhost-user. It was first seen with Linux virtiofs but >> it could happen with other devices and/or guest operating systems. >> >> Any VIRTIO Shared Memory Region that can be mmapped into Linux userspace >> may trigger this issue. Userspace may write(2) to an O_DIRECT file with >> the mmap as the source. The vhost-user-blk device will not be able to >> access the source device's VIRTIO Shared Memory Region and will fail. >> >> > So it may make >> > sense >> > to separate them so that one does not stall the other. I will try to >> > have both >> > integrated in the mid term. >> >> If READ_/WRITE_MEM is a pain to implement (I think it is in the >> vhost-user back-end, even though I've been a proponent of it), then >> another way to deal with this issue is to specify that upon receiving >> MAP/UNMAP messages, the vhost-user front-end must update the vhost-user >> memory tables of all other vhost-user devices. That way vhost-user >> devices will be able to access VIRTIO Shared Memory Regions mapped by >> other devices. >> >> Implementing this in QEMU should be much easier than implementing >> READ_/WRITE_MEM support in device back-ends. >> >> This will be slow and scale poorly but performance is only a problem for >> devices that frequently MAP/UNMAP like virtiofs. Will virtio-gpu and >> virtio-media use MAP/UNMAP often at runtime? They might be able to get >> away with this simple solution. >> >> I'd be happy with that. If someone wants to make virtiofs DAX faster, >> they can implement READ/WRITE_MEM or another solution later, but let's >> at least make things correct from the start. > > > I agree. I want it to be correct first. If you agree on splitting the spec > bits from this > patch I'm already happy. I suggested splitting READ_/WRITE_MEM messages > because I thought that it was a virtiofs-specific issue. > > The alternative that you proposed is interesting. I'll take it into account. > But I > feel I prefer to go for the better solution, and if I get too entangled, then > switch > to the easier implementation.
Great. The difficult part to implementing READ_/WRITE_MEM messages is modifying libvhost-user and rust-vmm's vhost crate to send the new messages when address translation fails. This needs to cover all memory accesses (including vring struct accesses). That code may be a few levels down in the call stack and assume it can always load/store directly from mmapped memory. > > I think we could do this in 2 patches: > 1. Split the documentation bits for SHMEM_MAP/_UNMAP. The > implementation for these messages will go into the second patch. > 2. The implementation patch: keep going for the time being with > READ_/WRITE_MEM support. And the documentation for that > is kept it within this patch. This way if we switch to the frontend > updating vhost-user memory table, we weren't set in any specific > solution if patch 1 has been already merged. I'm happy as long as the vhost-user spec patch that introduces MAP/UNMAP also covers a solution for the memory access problem (either READ_/WRITE_MEM or propagating mappings to all vhost-user back-ends). Stefan > > BR, > Albert. > >> >> >> Stefan >> >> > >> > WDYT? >> > >> > BR, >> > Albert. >> > >> > On Tue, Jul 16, 2024 at 3:21 AM David Stevens <steve...@chromium.org> >> > wrote: >> > >> > > On Fri, Jul 12, 2024 at 2:47 PM Michael S. Tsirkin <m...@redhat.com> >> > > wrote: >> > > > >> > > > On Fri, Jul 12, 2024 at 11:06:49AM +0900, David Stevens wrote: >> > > > > On Thu, Jul 11, 2024 at 7:56 PM Alyssa Ross <h...@alyssa.is> wrote: >> > > > > > >> > > > > > Adding David Stevens, who implemented SHMEM_MAP and SHMEM_UNMAP in >> > > > > > crosvm a couple of years ago. >> > > > > > >> > > > > > David, I'd be particularly interested for your thoughts on the >> > > MEM_READ >> > > > > > and MEM_WRITE commands, since as far as I know crosvm doesn't >> > > implement >> > > > > > anything like that. The discussion leading to those being added >> > > starts >> > > > > > here: >> > > > > > >> > > > > > >> > > https://lore.kernel.org/qemu-devel/20240604185416.gb90...@fedora.redhat.com/ >> > > > > > >> > > > > > It would be great if this could be standardised between QEMU and >> > > crosvm >> > > > > > (and therefore have a clearer path toward being implemented in >> > > > > > other >> > > VMMs)! >> > > > > >> > > > > Setting aside vhost-user for a moment, the DAX example given by >> > > > > Stefan >> > > > > won't work in crosvm today. >> > > > > >> > > > > Is universal access to virtio shared memory regions actually mandated >> > > > > by the virtio spec? Copying from virtiofs DAX to virtiofs sharing >> > > > > seems reasonable enough, but what about virtio-pmem to virtio-blk? >> > > > > What about screenshotting a framebuffer in virtio-gpu shared memory >> > > > > to >> > > > > virtio-scsi? I guess with some plumbing in the VMM, it's solvable in >> > > > > a >> > > > > virtualized environment. But what about when you have real hardware >> > > > > that speaks virtio involved? That's outside my wheelhouse, but it >> > > > > doesn't seem like that would be easy to solve. >> > > > >> > > > Yes, it can work for physical devices if allowed by host configuration. >> > > > E.g. VFIO supports that I think. Don't think VDPA does. >> > > >> > > I'm sure it can work, but that sounds more like a SHOULD (MAY?), >> > > rather than a MUST. >> > > >> > > > > For what it's worth, my interpretation of the target scenario: >> > > > > >> > > > > > Other backends don't see these mappings. If the guest submits a >> > > > > > vring >> > > > > > descriptor referencing a mapping to another backend, then that >> > > backend >> > > > > > won't be able to access this memory >> > > > > >> > > > > is that it's omitting how the implementation is reconciled with >> > > > > section 2.10.1 of v1.3 of the virtio spec, which states that: >> > > > > >> > > > > > References into shared memory regions are represented as offsets >> > > > > > from >> > > > > > the beginning of the region instead of absolute memory addresses. >> > > Offsets >> > > > > > are used both for references between structures stored within >> > > > > > shared >> > > > > > memory and for requests placed in virtqueues that refer to shared >> > > memory. >> > > > > >> > > > > My interpretation of that statement is that putting raw guest >> > > > > physical >> > > > > addresses corresponding to virtio shared memory regions into a vring >> > > > > is a driver spec violation. >> > > > > >> > > > > -David >> > > > >> > > > This really applies within device I think. Should be clarified ... >> > > >> > > You mean that a virtio device can use absolute memory addresses for >> > > other devices' shared memory regions, but it can't use absolute memory >> > > addresses for its own shared memory regions? That's a rather strange >> > > requirement. Or is the statement simply giving an addressing strategy >> > > that device type specifications are free to ignore? >> > > >> > > -David >> > > >> > >