Re: F41 Change Proposal: Change Compose Settings (system-wide)

2024-03-26 Thread Sirius via devel
In days of yore (Thu, 21 Mar 2024), Stephen Smoogen thus quoth: 
> On Wed, 20 Mar 2024 at 22:01, Kevin Kofler via devel <
> devel@lists.fedoraproject.org> wrote:
> 
> > Aoife Moloney wrote:
> > > The zstd compression type was chosen to match createrepo_c settings.
> > > As an alternative, we might want to choose xz,
> >
> > Since xz consistently compresses better than zstd, I would strongly
> > suggest
> > using xz everywhere to minimize download sizes. However:
> >
> > > especially after zlib-ng has been made the default in Fedora and brought
> > > performance improvements.
> >
> > zlib-ng is for gz, not xz, and gz is fast, but compresses extremely poorly
> > (which is mostly due to the format, so, while some implementations manage
> > to
> > do better than others at the expense of more compression time, there is a
> > limit to how well they can do and it is nowhere near xz or even zstd) and
> > should hence never be used at all.
> >
> >
> There are two parts to this which users will see as 'slowness'. Part one is
> downloading the data from a mirror. Part two is uncompressing the data. In
> work I have been a part of, we have found that while xz gave us much
> smaller files, the time to uncompress was so much larger that our download
> gains were lost. Using zstd gave larger downloads (maybe 10 to 20% bigger)
> but uncompressed much faster than xz. This is data dependent though so it
> would be good to see if someone could test to see if xz uncompression of
> the datafiles will be too slow.

Hi there,

Ran tests with gzip 1-9 and xz 1-9 on a F41 XML file that was 940MiB.

Input File: f41-filelist.xml, Size: 985194446 bytes
XZ level 1 : 21s to compress, 5.3% filesize, 4.4s to decompress
XZ level 2 : 28s to compress, 5.1% filesize, 4.2s to decompress
XZ level 3 : 44s to compress, 5.1% filesize, 4.2s to decompress
XZ level 4 : 55s to compress, 5.3% filesize, 4.5s to decompress
XZ level 5 : 1min25s to compress, 5.3% filesize, 4.3s to decompress
XZ level 6 : 2min49s to compress, 5.1% filesize, 4.4s to decompress
XZ level 7 : 2min55s to compress, 4.8% filesize, 4.2s to decompress
XZ level 8 : 3min 4s to compress, 4.8% filesize, 4.2s to decompress
XZ level 9 : 3min12s to compress, 4.8% filesize, 4.2s to decompress

Input File: f41-filelist.xml, Size: 985194446 bytes
GZ Level 1 :  6s to compress, 7.9% filesize, 4.2s to decompress
GZ Level 2 :  6s to compress, 7.8% filesize, 4.1s to decompress
GZ Level 3 :  7s to compress, 7.6% filesize, 4.1s to decompress
GZ Level 4 :  8s to compress, 6.8% filesize, 4.0s to decompress
GZ Level 5 :  9s to compress, 6.6% filesize, 4.0s to decompress
GZ Level 6 : 12s to compress, 6.6% filesize, 4.0s to decompress
GZ Level 7 : 15s to compress, 6.5% filesize, 4.0s to decompress
GZ Level 8 : 24s to compress, 6.4% filesize, 4.0s to decompress
GZ Level 9 : 28s to compress, 6.3% filesize, 4.0s to decompress

xz level 2 is not a shabby compromise as you get small filesize and time
to compress is the same as gzip level 9. To get the smallest filesizes,
the time (and memory requirements) of xz becomes very noticeable for not
much gain.



#!/bin/bash

INPUTFILE=f41-filelist.xml
INPUTFILESIZE=$(ls -ln f41-filelist.xml|awk '{print $5}')
## gzip
function do_gzip()
{
  let cl=1
  echo Input File: ${INPUTFILE}, Size: ${INPUTFILESIZE} bytes
  echo
  while [[ $cl -le 9 ]]
  do
echo GZip compression level ${cl}
echo Time to compress the file
time gzip -k -${cl} ${INPUTFILE}
COMPRESSED_SIZE=$(ls -ln ${INPUTFILE}.gz | awk '{print $5}')
echo Compressed to
echo "scale=5
${COMPRESSED_SIZE}/${INPUTFILESIZE}*100
"|bc
echo % of original
echo Time to decompress the file, output to /dev/null
time gzip -d -c ${INPUTFILE}.gz > /dev/null
rm -f ${INPUTFILE}.gz
let cl=$cl+1
echo
  done
}

## xz
function do_xz()
{
  let cl=1
  echo Input File: ${INPUTFILE}, Size: ${INPUTFILESIZE} bytes
  echo
  while [[ $cl -le 9 ]]
  do
echo XZ compression level ${cl}
echo Time to compress the file
time xz -k -z -${cl} ${INPUTFILE}
COMPRESSED_SIZE=$(ls -ln ${INPUTFILE}.xz | awk '{print $5}')
echo Compressed to
echo "scale=5
${COMPRESSED_SIZE}/${INPUTFILESIZE}*100
"|bc
echo % of original
echo Time to decompress the file, output to /dev/null
time xz -d -c ${INPUTFILE}.xz > /dev/null
rm -f ${INPUTFILE}.xz
let cl=$cl+1
echo
  done
}

do_gzip
do_xz

-- 
Kind regards,

/S
--
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: F41 Change Proposal: Change Compose Settings (system-wide)

2024-03-26 Thread Sirius via devel
In days of yore (Tue, 26 Mar 2024), fedora-devel thus quoth: 
> In days of yore (Thu, 21 Mar 2024), Stephen Smoogen thus quoth: 
> > On Wed, 20 Mar 2024 at 22:01, Kevin Kofler via devel <
> > devel@lists.fedoraproject.org> wrote:
> > 
> > > Aoife Moloney wrote:
> > > > The zstd compression type was chosen to match createrepo_c settings.
> > > > As an alternative, we might want to choose xz,
> > >
> > > Since xz consistently compresses better than zstd, I would strongly
> > > suggest
> > > using xz everywhere to minimize download sizes. However:
> > >
> > > > especially after zlib-ng has been made the default in Fedora and brought
> > > > performance improvements.
> > >
> > > zlib-ng is for gz, not xz, and gz is fast, but compresses extremely poorly
> > > (which is mostly due to the format, so, while some implementations manage
> > > to
> > > do better than others at the expense of more compression time, there is a
> > > limit to how well they can do and it is nowhere near xz or even zstd) and
> > > should hence never be used at all.
> > >
> > >
> > There are two parts to this which users will see as 'slowness'. Part one is
> > downloading the data from a mirror. Part two is uncompressing the data. In
> > work I have been a part of, we have found that while xz gave us much
> > smaller files, the time to uncompress was so much larger that our download
> > gains were lost. Using zstd gave larger downloads (maybe 10 to 20% bigger)
> > but uncompressed much faster than xz. This is data dependent though so it
> > would be good to see if someone could test to see if xz uncompression of
> > the datafiles will be too slow.
> 
> Hi there,
> 
> Ran tests with gzip 1-9 and xz 1-9 on a F41 XML file that was 940MiB.

Added tests with zstd 1-19, not using a dictionary to improve it any
further.

Input File: f41-filelist.xml, Size: 985194446 bytes

ZStd Level  1, 1.7s to compress, 6.46% file size,  0.6s decompress
ZStd Level  2, 1.7s to compress, 6.34% file size,  0.7s decompress
ZStd Level  3, 2.1s to compress, 6.26% file size,  0.7s decompress
ZStd Level  4, 2.3s to compress, 6.26% file size,  0.7s decompress
ZStd Level  5, 5.7s to compress, 5.60% file size,  0.6s decompress
ZStd Level  6, 7.2s to compress, 5.42% file size,  0.6s decompress
ZStd Level  7, 8.1s to compress, 5.39% file size,  0.6s decompress
ZStd Level  8, 9.5s to compress, 5.31% file size,  0.6s decompress
ZStd Level  9,10.4s to compress, 5.28% file size,  0.6s decompress
ZStd Level 10,13.6s to compress, 5.26% file size,  0.6s decompress
ZStd Level 11,18.4s to compress, 5.25% file size,  0.6s decompress
ZStd Level 12,19.5s to compress, 5.25% file size,  0.6s decompress
ZStd Level 13,30.9s to compress, 5.25% file size,  0.6s decompress
ZStd Level 14,39.7s to compress, 5.23% file size,  0.6s decompress
ZStd Level 15,56.1s to compress, 5.21% file size,  0.6s decompress
ZStd Level 16,  1min58s to compress, 5.52% file size,  0.7s decompress
ZStd Level 17,  2min25s to compress, 5.36% file size,  0.7s decompress
ZStd Level 18,  3min46s to compress, 5.43% file size,  0.8s decompress
ZStd Level 19, 10min36s to compress, 4.66% file size,  0.7s decompress

So to save 5.2MB in filesize (lvl19 vs lvl15) the server have to spend
eleven times longer compressing the file (and I did not look at resources
like CPU or RAM while doing this). I am sure there are other compression
mechanisms that can squeeze these files a bit further, but at what cost.
If it is a once a day event, maybe a high compression ration is
justifiable. If it has to happen hundreds of times per day - not so much.


## zstd
function do_zstd()
{
  let cl=1
  echo Input File: ${INPUTFILE}, Size: ${INPUTFILESIZE} bytes
  echo
  while [[ $cl -le 19 ]]
  do
echo ZStd compression level ${cl}
echo Time to compress the file
time zstd -z -${cl} ${INPUTFILE}
COMPRESSED_SIZE=$(ls -ln ${INPUTFILE}.zst | awk '{print $5}')
echo Compressed to
echo "scale=5
${COMPRESSED_SIZE}/${INPUTFILESIZE}*100
"|bc
echo % of original
echo Time to decompress the file, output to /dev/null
time zstd -d -c ${INPUTFILE}.zst > /dev/null
rm -f ${INPUTFILE}.zst
let cl=$cl+1
echo
  done
}

-- 
Kind regards,

/S
--
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue


Re: F41 Change Proposal: Change Compose Settings (system-wide)

2024-03-26 Thread Sirius via devel
In days of yore (Tue, 26 Mar 2024), fedora-devel thus quoth: 
> Also note that adding '-T0' to use all available cores of the CPU will 
> greatly speed up the results with zstd.
> 
> However, all this talking about the optimal compression level, but in 
> the end there's no way to set that to createrepo_c options, so ;-)

True. But running these tests illustrate quite well that there is
diminishing returns or serious tradeoffs required to reach for the biggest
compression ratios. Either they do not perform as well as a lower ratio,
they take inordinately long time to run or they require bespoke solutions
(like custom dicts tailored very specifically to what you are trying to
compress).

Saving bandwidth is a laudable goal but can not lose sight of practical
issues. :)

-- 
Kind regards,

/S
--
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue