> -----Original Message----- > From: Peter Xu <pet...@redhat.com> > Sent: Wednesday, March 20, 2024 11:35 PM > To: Liu, Yuan1 <yuan1....@intel.com> > Cc: Daniel P. Berrangé <berra...@redhat.com>; faro...@suse.de; qemu- > de...@nongnu.org; hao.xi...@bytedance.com; bryan.zh...@bytedance.com; Zou, > Nanhai <nanhai....@intel.com> > Subject: Re: [PATCH v5 5/7] migration/multifd: implement initialization of > qpl compression > > On Wed, Mar 20, 2024 at 03:02:59PM +0000, Liu, Yuan1 wrote: > > > > +static int alloc_zbuf(QplData *qpl, uint8_t chan_id, Error **errp) > > > > +{ > > > > + int flags = MAP_PRIVATE | MAP_POPULATE | MAP_ANONYMOUS; > > > > + uint32_t size = qpl->job_num * qpl->data_size; > > > > + uint8_t *buf; > > > > + > > > > + buf = (uint8_t *) mmap(NULL, size, PROT_READ | PROT_WRITE, > flags, - > > > 1, 0); > > > > + if (buf == MAP_FAILED) { > > > > + error_setg(errp, "multifd: %u: alloc_zbuf failed, job > num %u, > > > size %u", > > > > + chan_id, qpl->job_num, qpl->data_size); > > > > + return -1; > > > > + } > > > > > > What's the reason for using mmap here, rather than a normal > > > malloc ? > > > > I want to populate the memory accessed by the IAA device in the > initialization > > phase, and then avoid initiating I/O page faults through the IAA device > during > > migration, a large number of I/O page faults are not good for > performance. > > mmap() doesn't populate pages, unless with MAP_POPULATE. And even with > that it shouldn't be guaranteed, as the populate phase should ignore all > errors. > > MAP_POPULATE (since Linux 2.5.46) > Populate (prefault) page tables for a mapping. For a file > map‐ > ping, this causes read-ahead on the file. This will help to > re‐ > duce blocking on page faults later. The mmap() call > doesn't > fail if the mapping cannot be populated (for example, due > to > limitations on the number of mapped huge pages when > using > MAP_HUGETLB). Support for MAP_POPULATE in conjunction with > pri‐ > vate mappings was added in Linux 2.6.23. > > OTOH, I think g_malloc0() should guarantee to prefault everything in as > long as the call returned (even though they can be swapped out later, but > that applies to all cases anyway).
Thanks, Peter. I will try the g_malloc0 method here > > This problem also occurs at the destination, therefore, I recommend that > > customers need to add -mem-prealloc for destination boot parameters. > > I'm not sure what issue you hit when testing it, but -mem-prealloc flag > should only control the guest memory backends not the buffers that QEMU > internally use, afaiu. > > Thanks, > > -- > Peter Xu let me explain here, during the decompression operation of IAA, the decompressed data can be directly output to the virtual address of the guest memory by IAA hardware. It can avoid copying the decompressed data to guest memory by CPU. Without -mem-prealloc, all the guest memory is not populated, and IAA hardware needs to trigger I/O page fault first and then output the decompressed data to the guest memory region. Besides that, CPU page faults will also trigger IOTLB flush operation when IAA devices use SVM. Due to the inability to quickly resolve a large number of IO page faults and IOTLB flushes, the decompression throughput of the IAA device will decrease significantly.