Re: [PATCH 2/2] archive: avoid spawning `gzip`

Johannes Schindelin Fri, 26 Apr 2019 07:51:23 -0700

Hi Peff,

On Mon, 15 Apr 2019, Jeff King wrote:


> On Sun, Apr 14, 2019 at 12:01:10AM +0200, René Scharfe wrote:
>
> > >> As we already link to the zlib library, we can perform the compression
> > >> without even requiring gzip on the host machine.
> > >
> > > Very cool. It's nice to drop a dependency, and this should be a bit more
> > > efficient, too.
> >
> > Getting rid of dependencies is good, and using zlib is the obvious way to
> > generate .tgz files. Last time I tried something like that, a separate gzip
> > process was faster, though -- at least on Linux [1].  How does this one
> > fare?
>
> I'd expect a separate gzip to be faster in wall-clock time for a
> multi-core machine, but overall consume more CPU. I'm slightly surprised
> that your timings show that it actually wins on total CPU, too.

If performance is really a concern, you'll be much better off using `pigz`
than `gzip`.

> Here are best-of-five times for "git archive --format=tar.gz HEAD" on
> linux.git (the machine is a quad-core):
>
>   [before, separate gzip]
>   real        0m21.501s
>   user        0m26.148s
>   sys 0m0.619s
>
>   [after, internal gzwrite]
>   real        0m25.156s
>   user        0m25.059s
>   sys 0m0.096s
>
> which does show what I expect (longer overall, but less total CPU).
>
> Which one you prefer depends on your situation, of course. A user on a
> workstation with multiple cores probably cares most about end-to-end
> latency and using all of their available horsepower. A server hosting
> repositories and receiving many unrelated requests probably cares more
> about total CPU (though the differences there are small enough that it
> may not even be worth having a config knob to un-parallelize it).

I am a bit sad that this is so noticeable. Nevertheless, I think that
dropping the dependency is worth it, in particular given that `gzip` is
not exactly fast to begin with (you really should switch to `pigz` or to a
faster compression if you are interested in speed).

> > Doing compression in its own thread may be a good idea.
>
> Yeah. It might even make the patch simpler, since I'd expect it to be
> implemented with start_async() and a descriptor, making it look just
> like a gzip pipe to the caller. :)

Sadly, it does not really look like it is simpler.

And when going into the direction of multi-threaded compression anyway,
the `pigz` trick of compressing 32kB chunks in parallel is an interesting
idea, too.

All of this, however, is outside of the purview of this (still relatively
simple) patch series.

Ciao,
Dscho

Re: [PATCH 2/2] archive: avoid spawning `gzip`

Reply via email to