On Thu, 21 Apr 2005, Chris Mason wrote:
> 
> There have been a few threads on making git more space efficient, and 
> eventually someone mentions tiny files and space fragmentation.  Now that git 
> object names are decoupled from their compression, it's easier to consider a 
> a variety of compression algorithms.  I whipped up a really silly "pack files 
> together" compression.

Careful.

This is something that needs history to tell whether it's effective. In
particular, if one file changes and another one does not, your packed
archive now ends up being a new blob, so while you "saved space" by having
just one blob for the object, in reality you didn't save any space at all
because with the <x> files changing, you just guaranteed that the packed
blob changes <x> times more often.

See? Your "packing in space" ends up also resulting in "packing in time", 
and you didn't actually win anything.

(If you did a good job of packing, you hopefully didn't _lose_ anything
either - you needed 1:<x> number of objects that took 1:<x> the space if
the packing ended up perfect - but since you needed <x> times more of
these objects unless they all change together, you end up with exactly the
same space usage).

So the argument is: you can't lose with the method, and you _can_ win. 
Right?

Wrong. You most definitely _can_ lose: you end up having to optimize for
one particular filesystem blocking size, and you'll lose on any other
filesystem. And you'll lose on the special filesystem of "network
traffic", which is byte-granular.

I don't want to pee on peoples parades, and I'm all for gathering numbers, 
but the thing is, the current git isn't actually all that bad, and I 
guarantee that it's hard to make it better without using delta 
representation. And the current thing is really really simple.

                Linus
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to