Hi!

I'm building a bridge to expose vhost-user devices through VDUSE. The
code is still immature but I'm able to forward packets using
dpdk-l2fwd through VDUSE to VM. I'm now developing exposing virtiofsd,
but I've hit an error I'd like to discuss.

VDUSE devices can get all the memory regions the driver is using by
VDUSE_IOTLB_GET_FD ioctl. It returns a file descriptor with a memory
region associated that can be mapped with mmap, and an information
entry about the map it contains:
* Start and end addresses from the driver POV
* Offset within the mmaped region of these start and end
* Device permissions over that region.

[start=0xc3000][last=0xe7fff][offset=0xc3000][perm=1]

Now when I try to map it, it is impossible for the userspace device to
call mmap with any offset different than 0. So the "straightforward"
mmap with size = entry.last-entry.start and offset = entry.offset does
not work. I don't know if this is a limitation of Linux or VDUSE.

Checking QEMU's
subprojects/libvduse/libvduse.c:vduse_iova_add_region() I see it
handles the offset by adding it up to the size, instead of using it
directly as a parameter in the mmap:

void *mmap_addr = mmap(0, size + offset, prot, MAP_SHARED, fd, 0);

I can replicate it on the bridge for sure.

Now I send the VhostUserMemoryRegion to the vhost-user application.
The struct has these members:
struct VhostUserMemoryRegion {
    uint64_t guest_phys_addr;
    uint64_t memory_size;
    uint64_t userspace_addr;
    uint64_t mmap_offset;
};

So I can send the offset to the vhost-user device. I can check that
dpdk-l2fwd uses the same trick of adding offset to the size of the
mapping region [1], at
lib/vhost/vhost_user.c:vhost_user_mmap_region():

mmap_size = region->size + mmap_offset;
mmap_addr = mmap(NULL, mmap_size, PROT_READ | PROT_WRITE,
            MAP_SHARED | populate, region->fd, 0);

So mmap is called with offset == 0 and everybody is happy.

Now I'm moving to virtiofsd, and vm-memory crate in particular. And it
performs the mmap without the size += offset trick, at
MmapRegionBuilder<B>:build() [2].

I can try to apply the offset + size trick in my bridge but I don't
think it is the right solution. At first glance, the right solution is
to mmap with the offset as vm-memory crate do. But having libvduse and
DPDK apply the same trick sounds to me like it is a known limitation /
workaround I don't know about. What is the history of this? Can VDUSE
problem (if any) be solved? Am I missing something?

Thanks!

[1] 
https://github.com/DPDK/dpdk/blob/e2e546ab5bf5e024986ccb5310ab43982f3bb40c/lib/vhost/vhost_user.c#L1305
[2] https://github.com/rust-vmm/vm-memory/blob/main/src/mmap_unix.rs#L128


Reply via email to