On 28.04.20 09:23, Denis Plotnikov wrote: > > > On 28.04.2020 09:16, Max Reitz wrote: >> On 27.04.20 21:26, Denis Plotnikov wrote: >>> >>> On 27.04.2020 15:35, Max Reitz wrote: >>>> On 21.04.20 10:11, Denis Plotnikov wrote: >>>>> zstd significantly reduces cluster compression time. >>>>> It provides better compression performance maintaining >>>>> the same level of the compression ratio in comparison with >>>>> zlib, which, at the moment, is the only compression >>>>> method available. >>>>> >>>>> The performance test results: >>>>> Test compresses and decompresses qemu qcow2 image with just >>>>> installed rhel-7.6 guest. >>>>> Image cluster size: 64K. Image on disk size: 2.2G >>>>> >>>>> The test was conducted with brd disk to reduce the influence >>>>> of disk subsystem to the test results. >>>>> The results is given in seconds. >>>>> >>>>> compress cmd: >>>>> time ./qemu-img convert -O qcow2 -c -o >>>>> compression_type=[zlib|zstd] >>>>> src.img [zlib|zstd]_compressed.img >>>>> decompress cmd >>>>> time ./qemu-img convert -O qcow2 >>>>> [zlib|zstd]_compressed.img uncompressed.img >>>>> >>>>> compression decompression >>>>> zlib zstd zlib zstd >>>>> ------------------------------------------------------------ >>>>> real 65.5 16.3 (-75 %) 1.9 1.6 (-16 %) >>>>> user 65.0 15.8 5.3 2.5 >>>>> sys 3.3 0.2 2.0 2.0 >>>>> >>>>> Both ZLIB and ZSTD gave the same compression ratio: 1.57 >>>>> compressed image size in both cases: 1.4G >>>>> >>>>> Signed-off-by: Denis Plotnikov <dplotni...@virtuozzo.com> >>>>> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsement...@virtuozzo.com> >>>>> Reviewed-by: Alberto Garcia <be...@igalia.com> >>>>> QAPI part: >>>>> Acked-by: Markus Armbruster <arm...@redhat.com> >>>>> --- >>>>> docs/interop/qcow2.txt | 1 + >>>>> configure | 2 +- >>>>> qapi/block-core.json | 3 +- >>>>> block/qcow2-threads.c | 157 >>>>> +++++++++++++++++++++++++++++++++++++++++ >>>>> block/qcow2.c | 7 ++ >>>>> 5 files changed, 168 insertions(+), 2 deletions(-) >>>> [...] >>>> >>>>> diff --git a/block/qcow2-threads.c b/block/qcow2-threads.c >>>>> index 7dbaf53489..0525718704 100644 >>>>> --- a/block/qcow2-threads.c >>>>> +++ b/block/qcow2-threads.c >>>>> @@ -28,6 +28,11 @@ >>>>> #define ZLIB_CONST >>>>> #include <zlib.h> >>>>> +#ifdef CONFIG_ZSTD >>>>> +#include <zstd.h> >>>>> +#include <zstd_errors.h> >>>>> +#endif >>>>> + >>>>> #include "qcow2.h" >>>>> #include "block/thread-pool.h" >>>>> #include "crypto.h" >>>>> @@ -166,6 +171,148 @@ static ssize_t qcow2_zlib_decompress(void >>>>> *dest, size_t dest_size, >>>>> return ret; >>>>> } >>>>> +#ifdef CONFIG_ZSTD >>>>> + >>>>> +/* >>>>> + * qcow2_zstd_compress() >>>>> + * >>>>> + * Compress @src_size bytes of data using zstd compression method >>>>> + * >>>>> + * @dest - destination buffer, @dest_size bytes >>>>> + * @src - source buffer, @src_size bytes >>>>> + * >>>>> + * Returns: compressed size on success >>>>> + * -ENOMEM destination buffer is not enough to store >>>>> compressed data >>>>> + * -EIO on any other error >>>>> + */ >>>>> +static ssize_t qcow2_zstd_compress(void *dest, size_t dest_size, >>>>> + const void *src, size_t src_size) >>>>> +{ >>>>> + ssize_t ret; >>>>> + ZSTD_outBuffer output = { dest, dest_size, 0 }; >>>>> + ZSTD_inBuffer input = { src, src_size, 0 }; >>>> Minor style note: I think it’d be nicer to use designated initializers >>>> here. >>>> >>>>> + ZSTD_CCtx *cctx = ZSTD_createCCtx(); >>>>> + >>>>> + if (!cctx) { >>>>> + return -EIO; >>>>> + } >>>>> + /* >>>>> + * Use the zstd streamed interface for symmetry with >>>>> decompression, >>>>> + * where streaming is essential since we don't record the exact >>>>> + * compressed size. >>>>> + * >>>>> + * In the loop, we try to compress all the data into one zstd >>>>> frame. >>>>> + * ZSTD_compressStream2 potentially can finish a frame earlier >>>>> + * than the full input data is consumed. That's why we are >>>>> looping >>>>> + * until all the input data is consumed. >>>>> + */ >>>>> + while (input.pos < input.size) { >>>>> + size_t zstd_ret; >>>>> + /* >>>>> + * ZSTD spec: "You must continue calling >>>>> ZSTD_compressStream2() >>>>> + * with ZSTD_e_end until it returns 0, at which point you are >>>>> + * free to start a new frame". We assume that "start a new >>>>> frame" >>>>> + * means call ZSTD_compressStream2 in the very beginning or >>>>> when >>>>> + * ZSTD_compressStream2 has returned with 0. >>>>> + */ >>>>> + do { >>>>> + zstd_ret = ZSTD_compressStream2(cctx, &output, &input, >>>>> ZSTD_e_end); >>>> The spec makes it sound to me like ZSTD_e_end will always complete in a >>>> single call if there’s enough space in the output buffer. So the only >>>> team we have to loop would be when there isn’t enough space anyway: >>>> >>>> It says this about ZSTD_e_end: >>>>> flush operation is the same, and follows same rules as calling >>>>> ZSTD_compressStream2() with ZSTD_e_flush. >>>> Those rules being: >>>>> Note that, if `output->size` is too small, a single invocation with >>>>> ZSTD_e_flush might not be enough (return code > 0). >>>> So it seems like it will only return a value > 0 if the output >>>> buffer is >>>> definitely too small. >>>> >>>> The spec also notes that the return value is greater than 0 if: >>>>>> 0 if some data still present within internal buffer (the value is >>>>> minimal estimation of remaining size), >>>> So it’s a minimum estimate. That’s another point that heavily implies >>>> to me that if the return value were less than what’s left in the >>>> buffer, >>>> the function wouldn’t return but still try to write it out, until it >>>> realizes that there isn’t enough space in the output buffer, and then >>>> return a value that exceeds the remaining output buffer size. >>>> >>>> (Because if the function just played it safe, I would expect it to >>>> return a maximum estimate.) >>>> >>>> >>>> OTOH, if it were actually possible for ZSTD_e_end to finish a frame >>>> earlier than the end of the input, I think it would make more sense to >>>> use ZSTD_e_continue until the input is done and then finish with >>>> ZSTD_e_end, like the spec seems to propose. That way, we’d always end >>>> up with a single frame to make decompression simpler (and I think it >>>> would also make more sense overall). >>>> >>>> >>>> But anyway. From how I understand the spec, this code simply always >>>> ends up creating a single frame or erroring out, without looping ever. >>>> So it isn’t exactly wrong, it just seems overly complicated. (Again, >>>> assuming I understand the spec correctly. Which seems like a tough >>>> thing to assume, because the spec is not exactly obvious to read...) >>>> >>>> (Running some quick tests by converting some images with zstd >>>> compression seems to confirm that whenever ZSTD_compressStream2() >>>> returns, either zstd_ret > output.size - output.pos, or zstd_ret == 0 >>>> and input.pos == input.size. So none of the loops ever loop.) >>>> >>>> Max >>> So, what should we do? >>> >>> 1. Rely on the test that there's no need for the loop: >>> * make one ZSTD_compressStream2() call >>> * make sure it returned with zstd_ret == 0 and >>> input.pos == input.size. >>> if so, return with the size >>> * if not, check that zstd_ret > output.size - output.pos >>> if so, return with -ENOMEM >>> * if none above return with -EIO >>> >>> This should cover the majority of the compressing cases >> According to how I interpret the spec, “none of the above” should never >> happen except for ZSTD_isError(zstd_ret), so this should cover all >> compressing cases, actually. >> >>> 2. Leave the loop as is, because of the documentation: >>> "You *must* continue calling ZSTD_compressStream2() with ZSTD_e_end >>> until it returns 0, >>> at which point you are free to start a new frame." >> As far as I can see, the return value is always 0 or greater than the >> remaining buffer space, so this will always be satisfied even without a >> loop. (We will always have one of three cases: (1) Success and all >> input has been consumed, (2) ZSTD_isError(zstd_ret), so we return -EIO, >> (3) zstd_ret > output.size - output.pos, so we return -ENOMEM. >> >> I interpret the “You *must* continue until it returns 0” as “If there is >> no sufficient space in the output buffer, this function will return a >> value greater than 0 indicating how much space is at least still >> required. The caller is free to supply a greater output buffer for the >> next call (by supplying a different ZSTD_outBuffer structure), and then >> we’ll try again.” >> (I.e., retrying with the same ZSTD_outBuffer will make the function >> return immediately because it knows that it’s insufficient.) >> >> Max > > ok, removing the loop sounds reasonable. > My only concern is that *must* in the doc.
Well, if we just return an error whenever we get a return value != 0, then we shouldn’t have to care what we must and mustn’t do, because we’ll just abort the compression process then. > Could ZSTD-lib change the logic in the future relying on the fact > that they make users use ZSTD_compressStream() in a loop. It isn’t like I just wondered whether the loop is necessary and saw that with the current implementation, it didn’t seem necessary for any of the test images I have. My reasoning is based on the specification, which says for ZSTD_e_flush that it will only return a value > 0 if output->size is too small; and that ZSTD_e_end follows the same rules. So I think if they were to change behavior, they’d violate the spec. > Honestly, I can't imagine the case when they would want to do that, > but still. > Without the loop we're protected even in this case. The worst thing > could happen because of that is qcow2_zstd_compress() would return > with -EIO more frequently. I think so, too. > So, if I understand correctly, you are ok with removing the loop. Yes. Max
signature.asc
Description: OpenPGP digital signature