My proposed changes to allow for parallel download assume downloads are 
network-bound, so they can be separate from other jobs. If downloads are 
actually CPU-bound, then it has indeed no merit at all :)

Le 14 décembre 2020 17:20:17 GMT-05:00, "Ludovic Courtès" <l...@gnu.org> a 
écrit :
>Hi Guix!
>
>Consider these two files:
>
>https://ci.guix.gnu.org/nar/gzip/kfcrrl6p6f6v51jg5rirmq3q067zxih6-ungoogled-chromium-87.0.4280.88-0.b78cb92
>https://ci.guix.gnu.org/nar/lzip/kfcrrl6p6f6v51jg5rirmq3q067zxih6-ungoogled-chromium-87.0.4280.88-0.b78cb92
>
>Quick decompression bench:
>
>--8<---------------cut here---------------start------------->8---
>$ du -h /tmp/uc.nar.[gl]z
>103M   /tmp/uc.nar.gz
>71M    /tmp/uc.nar.lz
>$ gunzip -c < /tmp/uc.nar.gz| wc -c
>350491552
>$ time lzip -d </tmp/uc.nar.lz >/dev/null
>
>real   0m6.040s
>user   0m5.950s
>sys    0m0.036s
>$ time gunzip -c < /tmp/uc.nar.gz >/dev/null
>
>real   0m2.009s
>user   0m1.977s
>sys    0m0.032s
>--8<---------------cut here---------------end--------------->8---
>
>The decompression throughput (compressed bytes read in the first
>column,
>uncompressed bytes written in the second column) is:
>
>          input   |  output
>  gzip: 167 MiB/s | 52 MB/s
>  lzip:  56 MiB/s | 11 MB/s
>
>Indeed, if you run this from a computer on your LAN:
>
>  wget -O - … | gunzip > /dev/null
>
>you’ll find that wget caps at 50 M/s with gunzip, whereas with lunzip
>it
>caps at 11 MB/s.
>
>From my place I get a peak download bandwidth of 30+ MB/s from
>ci.guix.gnu.org, thus substitute downloads are CPU-bound (I can’t go
>beyond 11 M/s due to decompression).  I must say it never occurred to
>me
>it could be the case when we introduced lzip substitutes.
>
>I’d get faster substitute downloads with gzip (I would download more
>but
>the time-to-disk would be smaller.)  Specifically, download +
>decompression of ungoogled-chromium from the LAN completes in 2.4s for
>gzip vs. 7.1s for lzip.  On a low-end ARMv7 device, also on the LAN, I
>get 32s (gzip) vs. 53s (lzip).
>
>Where to go from here?  Several options:
>
>  0. Lzip decompression speed increases with compression ratio, but
>     we’re already using ‘--best’ on ci.  The only way we could gain is
>    by using “multi-member archives” and then parallel decompression as
>     done in plzip, but that’s probably not supported in lzlib.  So
>     we’re probably stuck here.
>
>  1. Since ci.guix.gnu.org still provides both gzip and lzip archives,
>     ‘guix substitute’ could automatically pick one or the other
>     depending on the CPU and bandwidth.  Perhaps a simple trick would
>     be to check the user/wall-clock time ratio and switch to gzip for
>    subsequent downloads if that ratio is close to one.  How well would
>     that work?
>
>  2. Use Zstd like all the cool kids since it seems to have a much
>     higher decompression speed: <https://facebook.github.io/zstd/>.
>     630 MB/s on ungoogled-chromium on my laptop.  Woow.
>
>  3. Allow for parallel downloads (really: parallel decompression) as
>     Julien did in <https://issues.guix.gnu.org/39728>.
>
>My preference would be #2, #1, and #3, in this order.  #2 is great but
>it’s quite a bit of work, whereas #1 could be deployed quickly.  I’m
>not
>fond of #3 because it just papers over the underlying issue and could
>be
>counterproductive if the number of jobs is wrong.
>
>Thoughts?
>
>Ludo’.

Reply via email to