On Wed, Feb 7, 2018 at 5:24 AM, Dr. David Alan Gilbert <dgilb...@redhat.com> wrote: > * Haozhong Zhang (haozhong.zh...@intel.com) wrote: >> On 02/07/18 13:03 +0000, Dr. David Alan Gilbert wrote: >> > * Haozhong Zhang (haozhong.zh...@intel.com) wrote: >> > > On 02/07/18 11:54 +0000, Dr. David Alan Gilbert wrote: >> > > > * Haozhong Zhang (haozhong.zh...@intel.com) wrote: >> > > > > When loading a compressed page to persistent memory, flush CPU cache >> > > > > after the data is decompressed. Combined with a call to pmem_drain() >> > > > > at the end of memory loading, we can guarantee those compressed pages >> > > > > are persistently loaded to PMEM. >> > > > >> > > > Can you explain why this can use the flush and doesn't need the special >> > > > memset? >> > > >> > > The best approach to ensure the write persistence is to operate pmem >> > > all via libpmem, e.g., pmem_memcpy_nodrain() + pmem_drain(). However, >> > > the write to pmem in this case is performed by uncompress() which is >> > > implemented out of QEMU and libpmem. It may or may not use libpmem, >> > > which is not controlled by QEMU. Therefore, we have to use the less >> > > optimal approach, that is to flush cache for all pmem addresses that >> > > uncompress() may have written, i.e.,/e.g., memcpy() and/or memset() in >> > > uncompress(), and pmem_flush() + pmem_drain() in QEMU. >> > >> > In what way is it less optimal? >> > If that's a legal thing to do, then why not just do a pmem_flush + >> > pmem_drain right at the end of the ram loading and leave all the rest of >> > the code untouched? >> >> For example, the implementation pmem_memcpy_nodrain() prefers to use >> movnt instructions w/o flush to write pmem if those instructions are >> available, and falls back to memcpy() + flush if movnt are not >> available, so I suppose the latter is less optimal. > > But if you use normal memcpy calls to copy a few GB of RAM in an > incoming migrate and then do a single flush at the end, isn't that > better?
Not really, because now you've needlessly polluted the cache and are spending CPU looping over the cachelines that could have been bypassed with movnt.