On 10.04.23 17:46, Steve Sistare wrote:
Guest RAM created with memory-backend-memfd is aligned to a
QEMU_VMALLOC_ALIGN=2M boundary, and memory-backend-memfd does not support
the "align" parameter to change the default. This is sub-optimal on
aarch64 kernels with a 64 KB page size and 512 MB huge page size, as the
range will not be backed by huge pages. Moreover, any shared allocation
using qemu_ram_mmap() will be sub-optimal on such a system if the align
parameter is less than 512 MB.
The kernel is smart enough to return a hugely aligned pointer for MAP_SHARED
mappings when /sys/kernel/mm/transparent_hugepage/shmem_enabled allows it.
However, the qemu function qemu_ram_mmap() mmap's twice to perform its own
alignment:
guardptr = mmap(0, total, PROT_NONE, flags, ...
flags |= shared ? MAP_SHARED : MAP_PRIVATE;
ptr = mmap(guardptr + offset, size, prot, flags | map_sync_flags, ...
On the first call, flags has MAP_PRIVATE, hence the kernel does not apply
its shared memory policy, and returns a non-huge-aligned guardptr.
To fix, for shared mappings, pass MAP_SHARED to both mmap calls.
Signed-off-by: Steve Sistare <steven.sist...@oracle.com>
Reviewed-by: Joao Martins <joao.m.mart...@oracle.com>
---
util/mmap-alloc.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
index 5ed7d29..37a0d1e 100644
--- a/util/mmap-alloc.c
+++ b/util/mmap-alloc.c
@@ -121,7 +121,7 @@ static bool map_noreserve_effective(int fd, uint32_t
qemu_map_flags)
* Reserve a new memory region of the requested size to be used for mapping
* from the given fd (if any).
*/
-static void *mmap_reserve(size_t size, int fd)
+static void *mmap_reserve(size_t size, int fd, int final_flags)
{
int flags = MAP_PRIVATE;
@@ -144,6 +144,7 @@ static void *mmap_reserve(size_t size, int fd)
#else
fd = -1;
flags |= MAP_ANONYMOUS;
+ flags |= final_flags & MAP_SHARED;
Setting both, MAP_PRIVATE and MAP_SHARED sure sounds very wrong.
The traditional way to handle that is via QEMU_VMALLOC_ALIGN.
I think you'd actually want to turn QEMU_VMALLOC_ALIGN into a function
that is able to special-case like
hw/virtio/virtio-mem.c:virtio_mem_default_thp_size() does for
"qemu_real_host_page_size() == 64 * KiB".
If you set QEMU_VMALLOC_ALIGN properly, you'll also have any DIMMs get
that alignment in guest physical address space.
--
Thanks,
David / dhildenb