On Tue, Jan 11, 2022 at 01:39:39PM +0100, David Hildenbrand wrote: > For fd-based shared memory, MAP_NORESERVE is only effective for hugetlb, > otherwise it's ignored. Older Linux versions that didn't support > reservation of huge pages ignored MAP_NORESERVE completely. > > The first client to mmap a hugetlb fd without MAP_NORESERVE will > trigger reservation of huge pages for the whole mmapped range. There are > two cases to consider: > > 1) QEMU mapped RAM without MAP_NORESERVE > > We're not dealing with a sparse mapping, huge pages for the whole range > have already been reserved by QEMU. An additional mmap() without > MAP_NORESERVE won't have any effect on the reservation. > > 2) QEMU mapped RAM with MAP_NORESERVE > > We're delaing with a sparse mapping, no huge pages should be reserved. > Further mappings without MAP_NORESERVE should be avoided. > > For 1), it doesn't matter if we set MAP_NORESERVE or not, so we can > simply set it. For 2), we'd be overriding QEMUs decision and trigger > reservation of huge pages, which might just fail if there are not > sufficient huge pages around. We must map with MAP_NORESERVE. > > This change is required to support virtio-mem with hugetlb: a > virtio-mem device mapped into the guest physical memory corresponds to > a sparse memory mapping and QEMU maps this memory with MAP_NORESERVE. > Whenever memory in that sparse region will be accessed by the VM, QEMU > populates huge pages for the affected range by preallocating memory > and handling any preallocation errors gracefully. > > So let's map shared RAM with MAP_NORESERVE. As libvhost-user only > supports Linux, there shouldn't be anything to take care of in regard of > other OS support. > > Without this change, libvhost-user will fail mapping the region if there > are currently not enough huge pages to perform the reservation: > fv_panic: libvhost-user: region mmap error: Cannot allocate memory > > Cc: "Marc-André Lureau" <marcandre.lur...@redhat.com> > Cc: "Michael S. Tsirkin" <m...@redhat.com> > Cc: Paolo Bonzini <pbonz...@redhat.com> > Cc: Raphael Norwitz <raphael.norw...@nutanix.com> > Cc: Stefan Hajnoczi <stefa...@redhat.com> > Cc: Dr. David Alan Gilbert <dgilb...@redhat.com> > Signed-off-by: David Hildenbrand <da...@redhat.com> > --- > subprojects/libvhost-user/libvhost-user.c | 10 +++++----- > 1 file changed, 5 insertions(+), 5 deletions(-) > > diff --git a/subprojects/libvhost-user/libvhost-user.c > b/subprojects/libvhost-user/libvhost-user.c > index 787f4d2d4f..3b538930be 100644 > --- a/subprojects/libvhost-user/libvhost-user.c > +++ b/subprojects/libvhost-user/libvhost-user.c > @@ -728,12 +728,12 @@ vu_add_mem_reg(VuDev *dev, VhostUserMsg *vmsg) { > * accessing it before we userfault. > */ > mmap_addr = mmap(0, dev_region->size + dev_region->mmap_offset, > - PROT_NONE, MAP_SHARED, > + PROT_NONE, MAP_SHARED | MAP_NORESERVE, > vmsg->fds[0], 0); > } else { > mmap_addr = mmap(0, dev_region->size + dev_region->mmap_offset, > - PROT_READ | PROT_WRITE, MAP_SHARED, vmsg->fds[0], > - 0); > + PROT_READ | PROT_WRITE, MAP_SHARED | MAP_NORESERVE, > + vmsg->fds[0], 0); > } > > if (mmap_addr == MAP_FAILED) { > @@ -878,7 +878,7 @@ vu_set_mem_table_exec_postcopy(VuDev *dev, VhostUserMsg > *vmsg) > * accessing it before we userfault > */ > mmap_addr = mmap(0, dev_region->size + dev_region->mmap_offset, > - PROT_NONE, MAP_SHARED, > + PROT_NONE, MAP_SHARED | MAP_NORESERVE, > vmsg->fds[i], 0); > > if (mmap_addr == MAP_FAILED) { > @@ -965,7 +965,7 @@ vu_set_mem_table_exec(VuDev *dev, VhostUserMsg *vmsg) > * mapped address has to be page aligned, and we use huge > * pages. */ > mmap_addr = mmap(0, dev_region->size + dev_region->mmap_offset, > - PROT_READ | PROT_WRITE, MAP_SHARED, > + PROT_READ | PROT_WRITE, MAP_SHARED | MAP_NORESERVE, > vmsg->fds[i], 0); > > if (mmap_addr == MAP_FAILED) {
Acked-by: Raphael Norwitz <raphael.norw...@nutanix.com>