Hao Xiang <hao.xi...@bytedance.com> writes:

> On Fri, Jan 5, 2024 at 3:52 PM Hao Xiang <hao.xi...@bytedance.com> wrote:
>>
>> On Fri, Jan 5, 2024 at 12:07 PM Fabiano Rosas <faro...@suse.de> wrote:
>> >
>> > Bryan Zhang <bryan.zh...@bytedance.com> writes:
>> >
>> > +cc Yuan Liu, Daniel Berrangé
>> >
>> > > Adds support for 'qatzip' as an option for the multifd compression
>> > > method parameter, but copy-pastes the no-op logic to leave the actual
>> > > methods effectively unimplemented. This is in preparation of a
>> > > subsequent commit that will implement actually using QAT for compression
>> > > and decompression.
>> > >
>> > > Signed-off-by: Bryan Zhang <bryan.zh...@bytedance.com>
>> > > Signed-off-by: Hao Xiang <hao.xi...@bytedance.com>
>> > > ---
>> > >  hw/core/qdev-properties-system.c |  6 ++-
>> > >  migration/meson.build            |  1 +
>> > >  migration/multifd-qatzip.c       | 81 ++++++++++++++++++++++++++++++++
>> > >  migration/multifd.h              |  1 +
>> > >  qapi/migration.json              |  5 +-
>> > >  5 files changed, 92 insertions(+), 2 deletions(-)
>> > >  create mode 100644 migration/multifd-qatzip.c
>> > >
>> > > diff --git a/hw/core/qdev-properties-system.c 
>> > > b/hw/core/qdev-properties-system.c
>> > > index 1a396521d5..d8e48dcb0e 100644
>> > > --- a/hw/core/qdev-properties-system.c
>> > > +++ b/hw/core/qdev-properties-system.c
>> > > @@ -658,7 +658,11 @@ const PropertyInfo qdev_prop_fdc_drive_type = {
>> > >  const PropertyInfo qdev_prop_multifd_compression = {
>> > >      .name = "MultiFDCompression",
>> > >      .description = "multifd_compression values, "
>> > > -                   "none/zlib/zstd",
>> > > +                   "none/zlib/zstd"
>> > > +#ifdef CONFIG_QATZIP
>> > > +                   "/qatzip"
>> > > +#endif
>> > > +                   ,
>> > >      .enum_table = &MultiFDCompression_lookup,
>> > >      .get = qdev_propinfo_get_enum,
>> > >      .set = qdev_propinfo_set_enum,
>> > > diff --git a/migration/meson.build b/migration/meson.build
>> > > index 92b1cc4297..e20f318379 100644
>> > > --- a/migration/meson.build
>> > > +++ b/migration/meson.build
>> > > @@ -40,6 +40,7 @@ if get_option('live_block_migration').allowed()
>> > >    system_ss.add(files('block.c'))
>> > >  endif
>> > >  system_ss.add(when: zstd, if_true: files('multifd-zstd.c'))
>> > > +system_ss.add(when: qatzip, if_true: files('multifd-qatzip.c'))
>> > >
>> > >  specific_ss.add(when: 'CONFIG_SYSTEM_ONLY',
>> > >                  if_true: files('ram.c',
>> > > diff --git a/migration/multifd-qatzip.c b/migration/multifd-qatzip.c
>> > > new file mode 100644
>> > > index 0000000000..1733bbddb7
>> > > --- /dev/null
>> > > +++ b/migration/multifd-qatzip.c
>> > > @@ -0,0 +1,81 @@
>> > > +/*
>> > > + * Multifd QATzip compression implementation
>> > > + *
>> > > + * Copyright (c) Bytedance
>> > > + *
>> > > + * Authors:
>> > > + *  Bryan Zhang <bryan.zh...@bytedance.com>
>> > > + *  Hao Xiang   <hao.xi...@bytedance.com>
>> > > + *
>> > > + * This work is licensed under the terms of the GNU GPL, version 2 or 
>> > > later.
>> > > + * See the COPYING file in the top-level directory.
>> > > + */
>> > > +
>> > > +#include "qemu/osdep.h"
>> > > +#include "exec/ramblock.h"
>> > > +#include "exec/target_page.h"
>> > > +#include "qapi/error.h"
>> > > +#include "migration.h"
>> > > +#include "options.h"
>> > > +#include "multifd.h"
>> > > +
>> > > +static int qatzip_send_setup(MultiFDSendParams *p, Error **errp)
>> > > +{
>> > > +    return 0;
>> > > +}
>> > > +
>> > > +static void qatzip_send_cleanup(MultiFDSendParams *p, Error **errp) {};
>> > > +
>> > > +static int qatzip_send_prepare(MultiFDSendParams *p, Error **errp)
>> > > +{
>> > > +    MultiFDPages_t *pages = p->pages;
>> > > +
>> > > +    for (int i = 0; i < p->normal_num; i++) {
>> > > +        p->iov[p->iovs_num].iov_base = pages->block->host + 
>> > > p->normal[i];
>> > > +        p->iov[p->iovs_num].iov_len = p->page_size;
>> > > +        p->iovs_num++;
>> > > +    }
>> > > +
>> > > +    p->next_packet_size = p->normal_num * p->page_size;
>> > > +    p->flags |= MULTIFD_FLAG_NOCOMP;
>> > > +    return 0;
>> > > +}
>> > > +
>> > > +static int qatzip_recv_setup(MultiFDRecvParams *p, Error **errp)
>> > > +{
>> > > +    return 0;
>> > > +}
>> > > +
>> > > +static void qatzip_recv_cleanup(MultiFDRecvParams *p) {};
>> > > +
>> > > +static int qatzip_recv_pages(MultiFDRecvParams *p, Error **errp)
>> > > +{
>> > > +    uint32_t flags = p->flags & MULTIFD_FLAG_COMPRESSION_MASK;
>> > > +
>> > > +    if (flags != MULTIFD_FLAG_NOCOMP) {
>> > > +        error_setg(errp, "multifd %u: flags received %x flags expected 
>> > > %x",
>> > > +                   p->id, flags, MULTIFD_FLAG_NOCOMP);
>> > > +        return -1;
>> > > +    }
>> > > +    for (int i = 0; i < p->normal_num; i++) {
>> > > +        p->iov[i].iov_base = p->host + p->normal[i];
>> > > +        p->iov[i].iov_len = p->page_size;
>> > > +    }
>> > > +    return qio_channel_readv_all(p->c, p->iov, p->normal_num, errp);
>> > > +}
>> > > +
>> > > +static MultiFDMethods multifd_qatzip_ops = {
>> > > +    .send_setup = qatzip_send_setup,
>> > > +    .send_cleanup = qatzip_send_cleanup,
>> > > +    .send_prepare = qatzip_send_prepare,
>> > > +    .recv_setup = qatzip_recv_setup,
>> > > +    .recv_cleanup = qatzip_recv_cleanup,
>> > > +    .recv_pages = qatzip_recv_pages
>> > > +};
>> > > +
>> > > +static void multifd_qatzip_register(void)
>> > > +{
>> > > +    multifd_register_ops(MULTIFD_COMPRESSION_QATZIP, 
>> > > &multifd_qatzip_ops);
>> > > +}
>> > > +
>> > > +migration_init(multifd_qatzip_register);
>> > > diff --git a/migration/multifd.h b/migration/multifd.h
>> > > index a835643b48..5600f7fc82 100644
>> > > --- a/migration/multifd.h
>> > > +++ b/migration/multifd.h
>> > > @@ -33,6 +33,7 @@ int multifd_queue_page(QEMUFile *f, RAMBlock *block, 
>> > > ram_addr_t offset);
>> > >  #define MULTIFD_FLAG_NOCOMP (0 << 1)
>> > >  #define MULTIFD_FLAG_ZLIB (1 << 1)
>> > >  #define MULTIFD_FLAG_ZSTD (2 << 1)
>> > > +#define MULTIFD_FLAG_QATZIP (3 << 1)
>> > >
>> > >  /* This value needs to be a multiple of qemu_target_page_size() */
>> > >  #define MULTIFD_PACKET_SIZE (512 * 1024)
>> > > diff --git a/qapi/migration.json b/qapi/migration.json
>> > > index 6d5a4b0489..e3cc195aed 100644
>> > > --- a/qapi/migration.json
>> > > +++ b/qapi/migration.json
>> > > @@ -625,11 +625,14 @@
>> > >  #
>> > >  # @zstd: use zstd compression method.
>> > >  #
>> > > +# @qatzip: use qatzip compression method.
>> > > +#
>> > >  # Since: 5.0
>> > >  ##
>> > >  { 'enum': 'MultiFDCompression',
>> > >    'data': [ 'none', 'zlib',
>> > > -            { 'name': 'zstd', 'if': 'CONFIG_ZSTD' } ] }
>> > > +            { 'name': 'zstd', 'if': 'CONFIG_ZSTD' },
>> > > +            { 'name': 'qatzip', 'if': 'CONFIG_QATZIP'} ] }
>> >
>> > In another thread adding support to another Intel accelerator (IAA) we
>> > decided that it was better to select the offloading as an accelerator
>> > method to multifd zlib rather than as an entirely new compression
>> > format. Take a look at that discussion:
>> > https://lore.kernel.org/r/ztfcnqbbqlmsu...@redhat.com
>>
>> We had some early discussion with Intel folks (probably a different
>> team than the one with the above patchset). The understanding at the
>> time is that QAT is good at both bulk data compression and
>> decompression. IAA is good at decompression with smaller data size but
>> compression performance is not the best. In multifd, we are
>> compressing up to 128 4k pages at a time and potentially this can
>> increase by configuring the packet size, at the time we thought QAT
>> could be a better fit in the multifd live migration scenario. We would
>> like to hear more from Intel.
>>
>> From our benchmark testing, with two QAT devices, we can get deflate
>> compression throughout to around 7GB/s with ~160% CPU. That's beating
>> the current software implementation (zlib and zstd) by a lot. We are
>> still tuning the configuration in QEMU live migration now.
>>
>> >
>> > As I understand it, QAT + QATzip would be compatible with both zlib and
>> > IAA + QPL, so we'd add another accelerator method like this:
>> >
>> > https://lore.kernel.org/r/20240103112851.908082-3-yuan1....@intel.com
>> >
>>
>> I quickly read over the IAA patchset and I saw this:
>>
>> "However, due to some reasons, QPL is currently
>> not compatible with the existing Zlib method that Zlib compressed data
>> can be decompressed by QPl and vice versa."
>>
>> The above probably means the current zlib software implementation and
>> IAA are not compatible.
>>
>> For QAT, although, both Intel's QATzip and zlib are internally using
>> deflate, QATzip only supports deflate with a 4 byte header, deflate
>> wrapped by Gzip header and footer, or deflate wrapped by Intel® QAT
>> Gzip* extension header and footer. None of the headers can be
>> recognized by zlib software implementation is my understanding. So if
>> we want to make them compatible with zlib, the QATzip library needs to
>> support that.
>
> I took a closer look at Intel's documentation here
> https://github.com/intel/QATzip
> QATzip does have a compression format QZ_DEFLATE_RAW, which uses the
> deflate format but no header at all. This looks like the same as what
> QEMU's current zlib implementation does - using the raw deflate
> format. I can have a quick test to confirm that. Meanwhile, the
> documentation mentioned that if using the raw deflate format,
> decompression cannot be offloaded by QAT hardware. So there is a
> trade-off here if we want to avoid creating a new compression format
> in QEMU.

The default zlib behavior is to include the zlib header and trailer. To
obtain a raw deflate stream we'd need to suppress the zlib wrapper. It
might be an option to change QEMU code to make it produce a stream
compatible with QAT provided we don't currently use any feature that
needs the header.

One immediate issue I see is that without the header zlib defaults to
the 32kb window size, which stops us from changing the window size to
cope with QPL's 4kb limitation.

>
>>
>> > All that, of course, assuming we even want to support both
>> > accelerators. They're addressing the same problem after all. I wonder
>> > how we'd choose a precedence, since both seem to be present in the same
>> > processor family.
>> >
>> >
>>
>> That's an interesting question :-) I think overall performance
>> (throughput and CPU overhead) should both be considered. IAA and QAT
>> accelerators don't present on all systems. We Bytedance choose to have
>> both on our platform when purchasing from Intel.

We might need someone with the proper hardware to run a benchmark using
both accelerators if we end up deciding to supporting both. Ideally we'd
have a precedence defined so we don't need to force the user to select
one of them when both are present.

Reply via email to