memset_thread_failed = false;
threads_created_flag = false;
memset_num_threads = get_memset_num_threads(smp_cpus);
@@ -534,7 +558,7 @@ static bool touch_all_pages(char *area, size_t hpagesize,
size_t numpages,
memset_thread[i].numpages = numpages_per_thread + (i < leftover);
memset_thread[i].hpagesize = hpagesize;
qemu_thread_create(&memset_thread[i].pgthread, "touch_pages",
- do_touch_pages, &memset_thread[i],
+ touch_fn, &memset_thread[i],
QEMU_THREAD_JOINABLE);
addr += memset_thread[i].numpages * hpagesize;
}
Do you have an indication of what the speed differential is for the
old read/write dance vs the kernel madvise. We needed to use threads
previously because the read/write dance is pretty terribly slow.
The kernel patch has some performance numbers:
https://lkml.kernel.org/r/20210712083917.16361-1-da...@redhat.com
For example (compressed),
**************************************************
4096 MiB MAP_PRIVATE:
**************************************************
Anon 4 KiB : Read/Write : 1054.041 ms
Anon 4 KiB : POPULATE_WRITE : 572.582 ms
Memfd 4 KiB : Read/Write : 1106.561 ms
Memfd 4 KiB : POPULATE_WRITE : 805.881 ms
Memfd 2 MiB : Read/Write : 357.606 ms
Memfd 2 MiB : POPULATE_WRITE : 356.937 ms
tmpfs : Read/Write : 1105.954 ms
tmpfs : POPULATE_WRITE : 822.826 ms
file : Read/Write : 1107.439 ms
file : POPULATE_WRITE : 857.622 ms
hugetlbfs : Read/Write : 356.127 ms
hugetlbfs : POPULATE_WRITE : 355.138 ms
4096 MiB MAP_SHARED:
**************************************************
Anon 4 KiB : Read/Write : 1060.350 m
Anon 4 KiB : POPULATE_WRITE : 782.885 ms
Anon 2 MiB : Read/Write : 357.992 ms
Anon 2 MiB : POPULATE_WRITE : 357.808 ms
Memfd 4 KiB : Read/Write : 1100.391 ms
Memfd 4 KiB : POPULATE_WRITE : 804.394 ms
Memfd 2 MiB : Read/Write : 358.250 ms
Memfd 2 MiB : POPULATE_WRITE : 357.334 ms
tmpfs : Read/Write : 1107.567 ms
tmpfs : POPULATE_WRITE : 810.094 ms
file : Read/Write : 1289.509 ms
file : POPULATE_WRITE : 1106.816 ms
hugetlbfs : Read/Write : 357.120 ms
hugetlbfs : POPULATE_WRITE : 356.693 ms
For huge pages, it barely makes a difference with smallish VMs. In the
other cases, it speeds it up, but not as extreme as that it would allow
for dropping multi-threading.
The original MADV_POPULATE from 2016
https://lore.kernel.org/patchwork/patch/389581/ mentiones that it
especially helps speed up multi-threaded pre-faulting, due to reduced
mmap_lock contention. I did not do any multi-threading benchmarks, though.
[...]
Initialized with random garbage from the stack
+
+ /*
+ * Sense on every invocation, as MADV_POPULATE_WRITE cannot be used for
+ * some special mappings, such as mapping /dev/mem.
+ */
+ if (madv_populate_write_possible(area, hpagesize)) {
+ use_madv_populate_write = true;
+ }
but this implicitly assumes it was initialized to false.
Indeed, thanks for catching that!
--
Thanks,
David / dhildenb