> -----Original Message----- > From: Daniel P. Berrangé <berra...@redhat.com> > Sent: Wednesday, March 20, 2024 11:21 PM > To: Liu, Yuan1 <yuan1....@intel.com> > Cc: pet...@redhat.com; faro...@suse.de; qemu-devel@nongnu.org; > hao.xi...@bytedance.com; bryan.zh...@bytedance.com; Zou, Nanhai > <nanhai....@intel.com> > Subject: Re: [PATCH v5 5/7] migration/multifd: implement initialization of > qpl compression > > On Wed, Mar 20, 2024 at 03:02:59PM +0000, Liu, Yuan1 wrote: > > > -----Original Message----- > > > From: Daniel P. Berrangé <berra...@redhat.com> > > > Sent: Wednesday, March 20, 2024 6:42 PM > > > To: Liu, Yuan1 <yuan1....@intel.com> > > > Cc: pet...@redhat.com; faro...@suse.de; qemu-devel@nongnu.org; > > > hao.xi...@bytedance.com; bryan.zh...@bytedance.com; Zou, Nanhai > > > <nanhai....@intel.com> > > > Subject: Re: [PATCH v5 5/7] migration/multifd: implement > initialization of > > > qpl compression > > > > > > On Wed, Mar 20, 2024 at 12:45:25AM +0800, Yuan Liu wrote: > > > > the qpl initialization includes memory allocation for compressed > > > > data and the qpl job initialization. > > > > > > > > the qpl initialization will check whether the In-Memory Analytics > > > > Accelerator(IAA) hardware is available, if the platform does not > > > > have IAA hardware or the IAA hardware is not available, the QPL > > > > compression initialization will fail. > > > > > > > > Signed-off-by: Yuan Liu <yuan1....@intel.com> > > > > Reviewed-by: Nanhai Zou <nanhai....@intel.com> > > > > --- > > > > migration/multifd-qpl.c | 243 > +++++++++++++++++++++++++++++++++++++++- > > > > 1 file changed, 242 insertions(+), 1 deletion(-) > > > > > > > > diff --git a/migration/multifd-qpl.c b/migration/multifd-qpl.c > > > > index 056a68a060..6de65e9da7 100644 > > > > --- a/migration/multifd-qpl.c > > > > +++ b/migration/multifd-qpl.c > > > > @@ -9,12 +9,253 @@ > > > > * This work is licensed under the terms of the GNU GPL, version 2 > or > > > later. > > > > * See the COPYING file in the top-level directory. > > > > */ > > > > + > > > > #include "qemu/osdep.h" > > > > #include "qemu/module.h" > > > > +#include "qapi/error.h" > > > > +#include "migration.h" > > > > +#include "multifd.h" > > > > +#include "qpl/qpl.h" > > > > + > > > > +typedef struct { > > > > + qpl_job **job_array; > > > > + /* the number of allocated jobs */ > > > > + uint32_t job_num; > > > > + /* the size of data processed by a qpl job */ > > > > + uint32_t data_size; > > > > + /* compressed data buffer */ > > > > + uint8_t *zbuf; > > > > + /* the length of compressed data */ > > > > + uint32_t *zbuf_hdr; > > > > +} QplData; > > > > + > > > > +static void free_zbuf(QplData *qpl) > > > > +{ > > > > + if (qpl->zbuf != NULL) { > > > > + munmap(qpl->zbuf, qpl->job_num * qpl->data_size); > > > > + qpl->zbuf = NULL; > > > > + } > > > > + if (qpl->zbuf_hdr != NULL) { > > > > + g_free(qpl->zbuf_hdr); > > > > + qpl->zbuf_hdr = NULL; > > > > + } > > > > +} > > > > + > > > > +static int alloc_zbuf(QplData *qpl, uint8_t chan_id, Error **errp) > > > > +{ > > > > + int flags = MAP_PRIVATE | MAP_POPULATE | MAP_ANONYMOUS; > > > > + uint32_t size = qpl->job_num * qpl->data_size; > > > > + uint8_t *buf; > > > > + > > > > + buf = (uint8_t *) mmap(NULL, size, PROT_READ | PROT_WRITE, > flags, - > > > 1, 0); > > > > + if (buf == MAP_FAILED) { > > > > + error_setg(errp, "multifd: %u: alloc_zbuf failed, job > num %u, > > > size %u", > > > > + chan_id, qpl->job_num, qpl->data_size); > > > > + return -1; > > > > + } > > > > > > What's the reason for using mmap here, rather than a normal > > > malloc ? > > > > I want to populate the memory accessed by the IAA device in the > initialization > > phase, and then avoid initiating I/O page faults through the IAA device > during > > migration, a large number of I/O page faults are not good for > performance. > > Does this mmap actually make a measurable difference ? > > If I've followed the code paths correctly, I think this > alloc_zbuf method only gets called during initial setup > of each migration thread. > > So this use of MAP_POPULATE seems to only make a difference > between faulting in before starting sending data, and faulting > in on first bit of data that's sent. I'm surprised if that's > noticable as a difference.
You are right, the performance impact is only on the first page fault processing, and has little impact on the overall live migration performance. I just did a simple test. The total time of live migration using g_malloc is 2321ms, and it takes 2098ms using mmap. I need more tests to check if g_malloc/g_malloc0 is feasible. > > This problem also occurs at the destination, therefore, I recommend that > > customers need to add -mem-prealloc for destination boot parameters. > > I can understand mem-prelloc making a difference as that guarantees > all of guest RAM is faulted in. > > > With regards, > Daniel > -- > |: https://berrange.com -o- > https://www.flickr.com/photos/dberrange :| > |: https://libvirt.org -o- > https://fstop138.berrange.com :| > |: https://entangle-photo.org -o- > https://www.instagram.com/dberrange :|