On Mon, Feb 27, 2017 at 08:46:10AM -0500, Rik van Riel wrote: > On Mon, 2017-02-27 at 11:10 +0000, Stefan Hajnoczi wrote: > > On Thu, Feb 23, 2017 at 10:59:22AM +0000, Daniel P. Berrange wrote: > > > When using a memory-backend object with prealloc turned on, QEMU > > > will memset() the first byte in every memory page to zero. While > > > this might have been acceptable for memory backends associated > > > with RAM, this corrupts application data for NVDIMMs. > > > > > > Instead of setting every page to zero, read the current byte > > > value and then just write that same value back, so we are not > > > corrupting the original data. > > > > > > Signed-off-by: Daniel P. Berrange <berra...@redhat.com> > > > --- > > > > > > I'm unclear if this is actually still safe in practice ? Is the > > > compiler permitted to optimize away the read+write since it doesn't > > > change the memory value. I'd hope not, but I've been surprised > > > before... > > > > > > IMHO this is another factor in favour of requesting an API from > > > the kernel to provide the prealloc behaviour we want. > > > > > > util/oslib-posix.c | 3 ++- > > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > > > diff --git a/util/oslib-posix.c b/util/oslib-posix.c > > > index 35012b9..8f5b656 100644 > > > --- a/util/oslib-posix.c > > > +++ b/util/oslib-posix.c > > > @@ -355,7 +355,8 @@ void os_mem_prealloc(int fd, char *area, size_t > > > memory, Error **errp) > > > > > > /* MAP_POPULATE silently ignores failures */ > > > for (i = 0; i < numpages; i++) { > > > - memset(area + (hpagesize * i), 0, 1); > > > + char val = *(area + (hpagesize * i)); > > > + memset(area + (hpagesize * i), 0, val); > > > > Please include a comment in the final patch explaining why we want to > > preserve memory contents. > > > > In the case of NVDIMM I'm not sure if the memset is needed at > > all. The > > memory already exists - no new pages need to be allocated by the > > kernel. > > We just want the page table entries to be populated for the NVDIMM > > when > > -mem-prealloc is used. > > > > Perhaps Andrea or Rik have ideas on improving the kernel interface > > and > > whether mmap(MAP_POPULATE) should be used with NVDIMM instead of this > > userspace "touch every page" workaround? > > Why do we need the page table entries to be populated > in advance at all?
It is a choice apps using QEMU have - they can tell QEMU whether they want prealloc or not - if they decide they do want it, then we should not be corrupting the data. > The high cost of the page fault for regular memory > is zeroing out the memory pages before we give them > to userspace. NVDIMM in the guest might be backed by regular memory in the host - QEMU doesn't require use of NVDIMM in the host. > Simply faulting in the NVDIMM memory as it is touched > may make more sense than treating it like DRAM, > especially given that with DAX, NVDIMM areas may be > orders of magnitude larger than RAM, and we really > do not want to set up all the page tables for every > part of the guest DAX "disk". Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :|