On Tue, Jan 11, 2022 at 01:39:39PM +0100, David Hildenbrand wrote:
> For fd-based shared memory, MAP_NORESERVE is only effective for hugetlb,
> otherwise it's ignored. Older Linux versions that didn't support
> reservation of huge pages ignored MAP_NORESERVE completely.
> 
> The first client to mmap a hugetlb fd without MAP_NORESERVE will
> trigger reservation of huge pages for the whole mmapped range. There are
> two cases to consider:
> 
> 1) QEMU mapped RAM without MAP_NORESERVE
> 
> We're not dealing with a sparse mapping, huge pages for the whole range
> have already been reserved by QEMU. An additional mmap() without
> MAP_NORESERVE won't have any effect on the reservation.
> 
> 2) QEMU mapped RAM with MAP_NORESERVE
> 
> We're delaing with a sparse mapping, no huge pages should be reserved.
> Further mappings without MAP_NORESERVE should be avoided.
> 
> For 1), it doesn't matter if we set MAP_NORESERVE or not, so we can
> simply set it. For 2), we'd be overriding QEMUs decision and trigger
> reservation of huge pages, which might just fail if there are not
> sufficient huge pages around. We must map with MAP_NORESERVE.
> 
> This change is required to support virtio-mem with hugetlb: a
> virtio-mem device mapped into the guest physical memory corresponds to
> a sparse memory mapping and QEMU maps this memory with MAP_NORESERVE.
> Whenever memory in that sparse region will be accessed by the VM, QEMU
> populates huge pages for the affected range by preallocating memory
> and handling any preallocation errors gracefully.
> 
> So let's map shared RAM with MAP_NORESERVE. As libvhost-user only
> supports Linux, there shouldn't be anything to take care of in regard of
> other OS support.
> 
> Without this change, libvhost-user will fail mapping the region if there
> are currently not enough huge pages to perform the reservation:
>  fv_panic: libvhost-user: region mmap error: Cannot allocate memory
> 
> Cc: "Marc-André Lureau" <marcandre.lur...@redhat.com>
> Cc: "Michael S. Tsirkin" <m...@redhat.com>
> Cc: Paolo Bonzini <pbonz...@redhat.com>
> Cc: Raphael Norwitz <raphael.norw...@nutanix.com>
> Cc: Stefan Hajnoczi <stefa...@redhat.com>
> Cc: Dr. David Alan Gilbert <dgilb...@redhat.com>
> Signed-off-by: David Hildenbrand <da...@redhat.com>
> ---
>  subprojects/libvhost-user/libvhost-user.c | 10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/subprojects/libvhost-user/libvhost-user.c 
> b/subprojects/libvhost-user/libvhost-user.c
> index 787f4d2d4f..3b538930be 100644
> --- a/subprojects/libvhost-user/libvhost-user.c
> +++ b/subprojects/libvhost-user/libvhost-user.c
> @@ -728,12 +728,12 @@ vu_add_mem_reg(VuDev *dev, VhostUserMsg *vmsg) {
>           * accessing it before we userfault.
>           */
>          mmap_addr = mmap(0, dev_region->size + dev_region->mmap_offset,
> -                         PROT_NONE, MAP_SHARED,
> +                         PROT_NONE, MAP_SHARED | MAP_NORESERVE,
>                           vmsg->fds[0], 0);
>      } else {
>          mmap_addr = mmap(0, dev_region->size + dev_region->mmap_offset,
> -                         PROT_READ | PROT_WRITE, MAP_SHARED, vmsg->fds[0],
> -                         0);
> +                         PROT_READ | PROT_WRITE, MAP_SHARED | MAP_NORESERVE,
> +                         vmsg->fds[0], 0);
>      }
>  
>      if (mmap_addr == MAP_FAILED) {
> @@ -878,7 +878,7 @@ vu_set_mem_table_exec_postcopy(VuDev *dev, VhostUserMsg 
> *vmsg)
>           * accessing it before we userfault
>           */
>          mmap_addr = mmap(0, dev_region->size + dev_region->mmap_offset,
> -                         PROT_NONE, MAP_SHARED,
> +                         PROT_NONE, MAP_SHARED | MAP_NORESERVE,
>                           vmsg->fds[i], 0);
>  
>          if (mmap_addr == MAP_FAILED) {
> @@ -965,7 +965,7 @@ vu_set_mem_table_exec(VuDev *dev, VhostUserMsg *vmsg)
>           * mapped address has to be page aligned, and we use huge
>           * pages.  */
>          mmap_addr = mmap(0, dev_region->size + dev_region->mmap_offset,
> -                         PROT_READ | PROT_WRITE, MAP_SHARED,
> +                         PROT_READ | PROT_WRITE, MAP_SHARED | MAP_NORESERVE,
>                           vmsg->fds[i], 0);
>  
>          if (mmap_addr == MAP_FAILED) {

Acked-by: Raphael Norwitz <raphael.norw...@nutanix.com>

Reply via email to