On Thu, 21 Apr 2005, Chris Mason wrote: > > There have been a few threads on making git more space efficient, and > eventually someone mentions tiny files and space fragmentation. Now that git > object names are decoupled from their compression, it's easier to consider a > a variety of compression algorithms. I whipped up a really silly "pack files > together" compression.
Careful. This is something that needs history to tell whether it's effective. In particular, if one file changes and another one does not, your packed archive now ends up being a new blob, so while you "saved space" by having just one blob for the object, in reality you didn't save any space at all because with the <x> files changing, you just guaranteed that the packed blob changes <x> times more often. See? Your "packing in space" ends up also resulting in "packing in time", and you didn't actually win anything. (If you did a good job of packing, you hopefully didn't _lose_ anything either - you needed 1:<x> number of objects that took 1:<x> the space if the packing ended up perfect - but since you needed <x> times more of these objects unless they all change together, you end up with exactly the same space usage). So the argument is: you can't lose with the method, and you _can_ win. Right? Wrong. You most definitely _can_ lose: you end up having to optimize for one particular filesystem blocking size, and you'll lose on any other filesystem. And you'll lose on the special filesystem of "network traffic", which is byte-granular. I don't want to pee on peoples parades, and I'm all for gathering numbers, but the thing is, the current git isn't actually all that bad, and I guarantee that it's hard to make it better without using delta representation. And the current thing is really really simple. Linus - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html