"Liu, Yuan1" <yuan1....@intel.com> wrote:
>> -----Original Message-----
>> From: Daniel P. Berrangé <berra...@redhat.com>
>> Sent: Thursday, October 19, 2023 11:32 PM
>> To: Peter Xu <pet...@redhat.com>
>> Cc: Juan Quintela <quint...@redhat.com>; Liu, Yuan1
>> <yuan1....@intel.com>; faro...@suse.de; leob...@redhat.com; qemu-
>> de...@nongnu.org; Zou, Nanhai <nanhai....@intel.com>
>> Subject: Re: [PATCH 0/5] Live Migration Acceleration with IAA Compression
>> 
>> On Thu, Oct 19, 2023 at 11:23:31AM -0400, Peter Xu wrote:
>> > On Thu, Oct 19, 2023 at 03:52:14PM +0100, Daniel P. Berrangé wrote:
>> > > On Thu, Oct 19, 2023 at 01:40:23PM +0200, Juan Quintela wrote:
>> > > > Yuan Liu <yuan1....@intel.com> wrote:
>> > > > > Hi,
>> > > > >
>> > > > > I am writing to submit a code change aimed at enhancing live
>> > > > > migration acceleration by leveraging the compression capability
>> > > > > of the Intel In-Memory Analytics Accelerator (IAA).
>> > > > >
>> > > > > Enabling compression functionality during the live migration
>> > > > > process can enhance performance, thereby reducing downtime and
>> > > > > network bandwidth requirements. However, this improvement comes
>> > > > > at the cost of additional CPU resources, posing a challenge for
>> > > > > cloud service providers in terms of resource allocation. To
>> > > > > address this challenge, I have focused on offloading the compression
>> overhead to the IAA hardware, resulting in performance gains.
>> > > > >
>> > > > > The implementation of the IAA (de)compression code is based on
>> > > > > Intel Query Processing Library (QPL), an open-source software
>> > > > > project designed for IAA high-level software programming.
>> > > > >
>> > > > > Best regards,
>> > > > > Yuan Liu
>> > > >
>> > > > After reviewing the patches:
>> > > >
>> > > > - why are you doing this on top of old compression code, that is
>> > > >   obsolete, deprecated and buggy
> Some users have not enabled the multifd feature yet, but they will
> decide whether to enable the compression feature based on the load
> situation. So I'm wondering if, without multifd, the compression
> functionality will no longer be available?

Next pull request will deprecate it.  So in two versions is going to be gone.

>> > > > - why are you not doing it on top of multifd.

> I plan to submit the support for multifd independently because the
> multifd compression and legacy compression code are separate.

compression code is really buggy.  I think you should not even try to
work on top of it.


> I looked at the code of multifd about compression. Currently, it uses
> the CPU synchronous compression mode. Since it is best to use the
> asynchronous processing method of the hardware accelerator, I would
> like to get suggestions on the asynchronous implementation.

I did that on a previous comment.
Several questions:

- you are using zlib, right?  When I tested, the longer streams you
  have, the better compression you get. right?
  Is there a way to "continue" with the state of the previous job?

  Old compression code, generates a new context for every packet.
  Multifd generates a new zlib context for each connection.


> 1. Dirty page scanning and compression pipeline processing, the main
> thread of live migration submits compression tasks to the hardware,
> and multifd threads only handle the transmission of compressed pages.
> 2. Data sending and compression pipeline processing, the Multifd
> threads submit compression tasks to the hardware and then transmit the
> compressed data. (A multifd thread job may need to transmit compressed
> data multiple times.)
>
>> > > > You just need to add another compression method on top of multifd.
>> > > > See how it was done for zstd:
> Yes, I will refer to zstd to implement multifd compression with IAA

Basically you can use two approachs here (simplifying a lot)
- for each channel
     submit job (512KB)
     wait for job
     send compressed stuff
  And you adjust the number of channels depending on how much
  concurrency you want.


- for each channel
     submit job
     while (number_of_jobs_submitted > some_threshold)
        wait_for_job
        send job
  Here you need to piggy back in the MULTIFD_FLAG_SYNC to wait for the
  rest of jobs.

Each one has its advantages/disadvantages.  With the 1st, it is simpler
to do, because it is for all effects synchronous, and simpler to
"contain" the concurrency.

With the second approach you get much more concurrency, but you need to
be careful about how much stuff do you have in flight.

Remember that you get queueds for each multifd channel.
How much asynchronous jobs (around 512KB each packet) can current
hardware handle?  I mean what is the optimus number, around 10, around
50, around 100?


>> > > I'm not sure that is ideal approach.  IIUC, the IAA/QPL library is
>> > > not defining a new compression format. Rather it is providing a
>> > > hardware accelerator for 'deflate' format, as can be made compatible
>> > > with zlib:
>> > >
>> > >
>> > > https://intel.github.io/qpl/documentation/dev_guide_docs/c_use_cases
>> > > /deflate/c_deflate_zlib_gzip.html#zlib-and-gzip-compatibility-refere
>> > > nce-link
>> > >
>> > > With multifd we already have a 'zlib' compression format, and so
>> > > this IAA/QPL logic would effectively just be a providing a second
>> > > implementation of zlib.
>> > >
>> > > Given the use of a standard format, I would expect to be able to use
>> > > software zlib on the src, mixed with IAA/QPL zlib on the target, or
>> > > vica-verca.
>> > >
>> > > IOW, rather than defining a new compression format for this, I think
>> > > we could look at a new migration parameter for
>> > >
>> > > "compression-accelerator": ["auto", "none", "qpl"]
>> > >
>> > > with 'auto' the default, such that we can automatically enable
>> > > IAA/QPL when 'zlib' format is requested, if running on a suitable
>> > > host.
>> >
>> > I was also curious about the format of compression comparing to
>> > software ones when reading.
>> >
>> > Would there be a use case that one would prefer soft compression even
>> > if hardware accelerator existed, no matter on src/dst?
>> >
>> > I'm wondering whether we can avoid that one more parameter but always
>> > use hardware accelerations as long as possible.
> I want to add a new compression format(QPL or IAA-Deflate) here. The reasons 
> are as follows:
> 1. The QPL library already supports both software and hardware paths
> for compression.

The question is if IAA-Deflate is compatible with zlib-deflate.
What are the advantages of QPL software implementation vs zlib?
- Is it faster?
- Does it uses less resources.

> The software path uses a fast Deflate compression
> algorithm, while the hardware path uses IAA.

Is it faster than zlib?
And doing all of this asynchronous job dance is not going to be slower
than just calling the functions in a software implementation?

> 2. QPL's software and hardware paths are based on the Deflate
> algorithm, but there is a limitation: the history buffer only supports
> 4K. The default history buffer for zlib is 32K, which means that IAA
> cannot decompress zlib-compressed data. However, zlib can decompress
> IAA-compressed data.

Aha.  Thanks, that was what we wanted to know.

> 3. For zlib and zstd, Intel QuickAssist Technology can accelerate both of 
> them.

Do we have any number than we could look at?
We are interested in three things:
- how faster is it
- how much cpu is saved using IAA
- how much latency does it add

Thanks, Juan.

>> Yeah, I did wonder about whether we could avoid a parameter, but then I'm
>> thinking  it is good to have an escape hatch if we were to find any flaws in 
>> the
>> QPL library's impl of deflate() that caused interop problems.
>> 
>> With regards,
>> Daniel
>> --
>> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange 
>> :|
>> |: https://libvirt.org         -o-            https://fstop138.berrange.com 
>> :|
>> |: https://entangle-photo.org    -o-
>> https://www.instagram.com/dberrange :|


Reply via email to