On 06/26/2017 01:16 PM, Laszlo Ersek wrote: > On 06/26/17 11:33, Denis V. Lunev wrote: >> On 06/26/2017 12:20 PM, Peter Lieven wrote: >>> Am 26.06.2017 um 10:28 schrieb Kevin Wolf: >>>> [ Cc: qemu-devel; don't post to qemu-block only! ] >>>> >>>> Am 26.06.2017 um 09:57 hat Peter Lieven geschrieben: >>>>> Hi, >>>>> >>>>> I am currently working on optimizing speed for compressed QCOW2 >>>>> images. We use them for templates and would also like to use them for >>>>> backups, but the latter is almost infeasible because using gzip for >>>>> compression is horribly slow. I tried to experiment with different >>>>> options to deflate, but in the end I think its better to use a >>>>> different compression algorithm for cases where speed matters. As we >>>>> already have probing for it in configure and as it is widely used I >>>>> would like to use LZO for that purpose. I think it would be best to >>>>> have a flag to indicate that compressed blocks use LZO compression, >>>>> but I would need a little explaination which of the feature fields I >>>>> have to use to prevent an older (incompatible) Qemu opening LZO >>>>> compressed QCOW2 images. >>>>> >>>>> I also have already some numbers. I converted a fresh Debian 9 Install >>>>> which has an uncomressed QCOW2 size of 1158 MB with qemu-img to a >>>>> compressed QCOW2. With GZIP compression the result is 356MB whereas >>>>> the LZO version is 452MB. However, the current GZIP variant uses 35 >>>>> seconds for this operation where LZO only needs 4 seconds. I think is >>>>> is a good trade in especially when its optional so the user can >>>>> choose. >>>>> >>>>> What are your thoughts? >>>> We had a related RFC patch by Den earlier this year, which never >>>> received many comment and never got out of RFC: >>>> >>>> https://lists.gnu.org/archive/html/qemu-devel/2017-03/msg04682.html >>> I was not aware of that one. Thanks for pointing out. >>> >>>> So he chose a different algorithm (zstd). When I asked, he posted a >>>> comparison of algorithms (however a generic one and not measured in the >>>> context of qemu) that suggests that LZO would be slightly faster, but >>>> have a considerable worse compression ratio with the settings that were >>>> benchmarked. >>> My idea to choose LZO was that it is widely available and available in >>> any distro you can think of. We already have probing for it in configure. >>> My concern with ZSTD would be that it seems there are no packages >>> available for most distros and that it seems to be multi-threaded. I >>> don't >>> know if this will cause any trouble? >>> >> We have had that compression working in multithreaded process. >> >>>> I think it's clear that if there is any serious interest in compression, >>>> we'll want to support at least one more algorithm. What we still need to >>>> evaluate is which one(s) to take, and whether a simple incompatible flag >>>> in the header like in Den's patch is enough or whether we should add a >>>> whole new header field for the compression algorithm (like we already >>>> have for encryption). >>> From my side there clearly is interest in optimizing the compression. Its >>> even possible to speed up zlib by 3-4x times by choosing other parameters >>> for deflate which unfortunately are not compatible with our inflate >>> settings. >>> >>> I don't know if its worth creating a new header field. Even if we >>> spent to bits >>> in the end (one for LZO and one for ZSDT). I think this wouldn't hurt. >>> However, >>> there are likely to pop up new compression algorithms in the future and >>> a header would be more flexible. >>> >>> I just don't want to make it too complicated and as you pointed out >>> compression is >>> not that interesting for most people - maybe due to its speed. >>> >> I think we need something generic but simple. I think that we should not >> support compression with the different algorithm in the single file. >> >> Speaking about compression, we do have different constraints for >> different situation, f.e. backups are written once and rarely read while >> generic compression in backing store is read frequently but never >> read. Thus the exact algorithm should be selectable. > Pluggable / selectable compression methods are likely the most flexible > and future-proof. A new header sounds good to me (... said by someone > who comments on this from the sidelines.) > > I would advise caution against multi-threaded compression libraries. > Unless they are coded very-very carefully with regard to signal handling > and general error handling / propagation, they cannot be considered > "opaque" enough. > > (I had written and maintained the original (0.x) branch of "lbzip2", > which was extremely conscious of error handling and signals. That was a > challenge even in a standalone program, and I didn't even attempt to > retrofit the code to the existing libbz2 APIs (i.e. I never even tried > to librarize the code).) > > This does not mean that people cannot get such a library right. It's > just that *by default* such a library will have a number of obscure bugs > related to: signals, forking, and general error handling. It could also > have problems with unbounded memory allocation. An MT compression > library that gets all of this right is the exception IMO, not the norm. > (I don't know anything about the ZSTD library; it could be such a high > quality library.) > > Another complication with MT *de*compression is that the CPU demand from > the IO thread (which is by default responsible for handling IO, when not > using dataplane -- is that right?) would "leak" to other physical > processors. I believe this can interfere with use cases where people > carefully isolate host CPUs between "QEMU" and "non-QEMU" workloads, > plus pin QEMU's VCPU threads, and IO threads, to different host CPUs > (see vcpupin / emulatorpin / iothreadpin under > <http://libvirt.org/formatdomain.html#elementsCPUTuning>.) It's probably > possible to figure out the right thing for "ZSTD threads" as well, but > IMO it remains a complication nonetheless. We already have the prototype delegating compression processing into thread pool processing other IO commands like flush/fallocate. All compression/decompression routines in long term MUST be moved out of IO thread as they add too much latency.
> > Personally I would recommend a new header, and LZO, as a starting point. > As pointed out above, LZO is widely available in distros. It has good > performance, and it is single-threaded similarly to zlib. I use LZO for > two QEMU-related purposes ATM: > > - I use the kdump-lzo format when dumping guest memory > (virsh dump $DOMAIN $CORE_FLE --memory-only --format kdump-lzo) > > - I use LZO compression for "virsh managedsave" > (by setting "save_image_format" in "/etc/libvirt/qemu.conf" to "lzop") > > Thanks > Laszlo